Digital communication system using real-time capacity achieving encoder design for channels with memory and feedback

ABSTRACT

A method for characterizing a capacity of a channel with memory and feedback comprises defining a channel model corresponding to the channel, wherein: the channel is utilized to transmit information from a source to a destination, and the channel model indicates a dependence of outputs from the channel on past and present channel input symbols and on past channel output symbols. The method further includes determining a representation of the capacity based on the channel model and based on a channel input distribution that achieves the capacity, wherein the representation represents the capacity for a finite number of transmissions over the channel, and wherein the representation includes an optimization, and solving the optimization of the representation of the capacity to determine the capacity of the channel for the finite number of transmissions.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and the benefit of the filing dateof U.S. Provisional Patent Application Ser. No. 61/987,797, which wasentitled “Real-Time Capacity Achieving Encoder Design For Channels WithMemory And Feedback” and filed on May 2, 2014. The entire disclosure ofthis application is hereby expressly incorporated by reference hereinfor all uses and purposes.

TECHNICAL FIELD

This patent application relates generally to systems and methods forfacilitating real-time communication, and more particularly, todetermining properties of channel input distributions that characterizecapacities of channels with memory, with feedback, with or withouttransmission cost, and without feedback (in certain cases in whichfeedback does not increase capacity), to designing encoders to achievecapacities of channels with memory, to characterizing nonanticipativerate distortion functions for sources of information with memory, and toutilize characterized rate distortion functions and channel capacitiesin Joint Source-Channel Coding (JSCC).

BACKGROUND

Advancements in information technology and networks are transforming theeveryday lives of many people with respect to employment, health care,communication, education, environment, etc. In particular, advancementsin information technology and networks have spawned the field ofCyber-Physical Systems (CPSs), which field refers to the next generationof engineering systems integrated via advanced technologies andprotocols. These engineering systems are capable of performingubiquitous computing, communication, and control for complex physicalsystems and can be implemented in energy systems (e.g., the electricpower distribution and smart grids), transportation systems (e.g.,traffic networks), health care and medical systems, surveillancenetworks, control systems for underwater and unmanned aerial vehicles,etc. In many of these applications, sub-systems, sensors or observationposts, and controllers or control stations are distributed, often atdistinct locations, and communication among sub-systems is limited.Thus, in the context of such systems, there is a demand for real-timecommunication, decentralized decisions, and the integration of real-timecommunication and decentralized decisions into complex networks.

In the field of communications, most encoders and decoders fortransmitting information over channels (e.g., for transmitting speechsignals over wireless communication channels) are designed based on anassumption that the channels do not have memory. That is, mostcommunication systems are configured based on theories, methods,expressions, etc. assuming that channels of the communication systemhave an conditional output probability distribution that depends only ona current input (i.e., the output of the channels is conditionallyindependent of previous channel inputs or outputs symbols and the sourcesymbols). However, typical communication channels are not memoryless dueto Inter-symbol interference (ISI), correlated channel noise, etc. As aresult, most communication systems are configured with components (e.g.,encoders) that do not operate optimally when transmitting informationover channels (e.g., the components do not achieve the capacity of thechannels with memory) and are, in many cases, overly complicated due tothe lack of knowledge of capacity achieving properties of the encoders.A characterization of channel capacity and corresponding capacityachieving channel input distributions, which would allow for the designof capacity achieving encoders, is not known for most channels withmemory.

Further, the field of communications has developed primarily based onthe following model: a message is generated randomly by an informationsource, the message is encoded by an encoder, the message is transmittedas a signal over a noisy channel, and the transmitted signal is decodedto produce an output as an approximation of the message generated by thesource. The fundamental problem of this model is to determinesimultaneously what information should be transmitted (source coding)and how the information should be transmitted (channel coding) toachieve performance. Over the years, this fundamental problem has beenseparated into the two subproblems of source coding and channel coding.The first sub-problem, source coding, is related to efficientrepresentation of information, such as information representing speech,so as to minimize information storage and to characterization of aminimum rate of compressing the information generated by the source ofthe information (e.g., via the classical Rate Distortion Function (RDF)of the source subject to a fidelity of reconstruction). The secondsub-problem, channel coding or “error correction coding,” is related toa correction of errors arising from channel noise, such as flaws in aninformation/data storage or transmission system, loss of informationpackets in networks, failures of communications links, etc., and to thecharacterization of the maximum rate of information transmission, called“channel capacity.”

The general separation of the fundamental problem into source coding andchannel coding sub-problems, has divided the community of developersinto independent groups developing source codes and channel codes,respectively. Although, extremely useful in some contexts, thisidealized separation is limiting future advances in communicationtechnology, in that developers are ignoring practical design criteria,such as computational complexity, delay, and optimal performance.Further, the ideal separation of source coding and channel coding isoften violated for point-to-point communications over channels withmemory and for network communication systems. On the other hand, theoptimal design of simultaneously performing data compression and channelcoding is, in those known cases, elegantly simple. However, this optimaldesign is, in general, hard to find. For example, separation of sourceand channel coding leads to the design of channel codes which treat allinformation bits as equally important. However, a scenario in which allinformation bits are equally important (e.g., in achieving an optimalchannel capacity) is rare, and, hence, a separation of source andchannel coding can lead to performance degradation.

SUMMARY

In an embodiment, a method for characterizing a capacity of a channelwith memory and feedback comprises defining a channel modelcorresponding to the channel, wherein: the channel is utilized totransmit information from a source to a destination, and the channelmodel indicates a dependence of outputs from the channel on past andpresent channel input symbols and on past channel output symbols. Themethod further includes determining a representation of the capacitybased on the channel model and based on a channel input distributionthat achieves the capacity, wherein the representation represents thecapacity for a finite number of transmissions over the channel, andwherein the representation includes an optimization, and solving, by oneor more processors of a specially configured computing device, theoptimization of the representation of the capacity to determine thecapacity of the channel for the finite number of transmissions and a perunit time limit of the capacity of the channel.

In another embodiment, a system comprises one or more processors and oneor more non-transitory memories. The one or more non-transitory memoriesstore computer-readable instructions that specifically configure thesystem such that, when executed by the one or more processors, thecomputer-readable instructions cause the system to: receive a channelmodel corresponding to the channel, wherein: the channel is utilized totransmit information from a source to a destination, and the channelmodel indicates a dependence of outputs from the channel on past andpresent channel input symbols and on past channel output symbols. thecomputer-readable instructions further cause the system to: determine arepresentation of the capacity based on the channel model and based on achannel input distribution that achieves the capacity, wherein therepresentation represents the capacity for a finite number oftransmissions over the channel, and wherein the representation includesan optimization, and solve the optimization of the representation of thecapacity to determine the capacity of the channel for the finite numberof transmissions.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of an example communication system forencoding and transmitting messages generated by a source;

FIG. 2 illustrates an example process to characterize a capacity of achannel, such as the channel illustrated in FIG. 1;

FIG. 3 is a flow diagram of an example method for characterizing acapacity and optimal channel input distribution with a two-stepprocedure;

FIG. 4 is a flow diagram of an example method for determining subsets ofchannel input distributions and a corresponding capacity based on stepone of a two-step procedure, such as the two step procedure utilized inthe method of FIG. 3;

FIG. 5 is a flow diagram of an example method for narrowing subsets ofchannel input distributions and determining a corresponding refinedcapacity based on step two of a two-step procedure, such as the two stepprocedure utilized in the method of FIG. 3;

FIG. 6 illustrates an example process for designing an informationlossless encoder, which encoder may be implemented in the systemillustrated in FIG. 1;

FIG. 7 illustrates an example encoding scheme which may be implementedby an information lossless encoder designed according to the processillustrated in FIG. 6;

FIG. 8 is a flow diagram of an example method for designing a capacityachieving and information lossless encoder, which encoder may beimplemented in the system illustrated in FIG. 1;

FIG. 9 is a flow diagram of an example method for determining andutilizing a rate distortion function, which function may be utilized toconfigure the system illustrated in FIG. 1;

FIG. 10A depicts a combination of components in a communication system1000 that realizes a rate distortion function, such as a rate distortionfunction determined in the method of FIG. 9;

FIG. 10B depicts an example realization of a nonanticipative ratedistortion function for a vector of independent sources with memory;

FIG. 11 depicts another example communication system in which encodersand decoders are designed according to joint source channel coding;

FIG. 12 is a flow diagram of an example method for identifying andutilizing an {encoder, decoder} pair, which pair may be utilized in thesystem illustrated in FIG. 1;

FIG. 13 illustrates a realization of a finite-time RDF;

FIG. 14 illustrate example{encoder, decoder} pairs that realize an RDFand a capacity of a channel; and

FIG. 15 is a block diagram of an example computing device.

DETAILED DESCRIPTION

The techniques of the present disclosure facilitate a characterizationof capacity and capacity achieving channel input distributions forchannels with memory, with or without feedback, and with transmissioncost. Further, encoders of the present disclosure satisfy necessary andsufficient conditions to achieve the capacity of channels with memory,with or without feedback, and with transmission cost, and methods of thepresent disclosure include determining whether an optimal transmission,for a given model of channels and transmission cost, is indeed real-timetransmission. Encoders of the present disclosure may also be configuredthat such that the encoders simultaneously compress and encodeinformation for sources with memory and/or with zero-delay, byperforming a JSCC design.

To this end, a two-step procedure determines the structural propertiesof (i) capacity achieving encoders, and (ii) capacity achieving channelinput distributions, for general channels with memory and with orwithout feedback encoding and transmission cost. More specifically, thetwo-step procedure identifies “multi-letter,” “finite block length”feedback capacity expressions along with the corresponding capacityachieving channel input distributions and encoders. By extension, theprocedure identifies per unit time limiting feedback capacity formulasalong with the corresponding capacity achieving channel inputdistributions and encoders.

Further, necessary and sufficient conditions allow for the design of anencoder with or without feedback that achieves the capacity of a channelwith memory. These encoders, referred to herein as “informationlossless” encoders, define a mapping of information from a source, whichmapping generates the information to the output of the encoder. Themapping may be invertible such that no information is lost in theencoding process. For each of a plurality a channel classes, thedetermined capacity achieving input distributions, mentioned above,allow specific mapping (e.g., encoders) to be defined. These specificmappings adhere to the necessary and sufficient conditions for capacityachieving encoders.

Nonanticipative rate distortion functions (RDFs) of the presentdisclosure may achieve a zero-delay compression of information from asource. These nonanticipative RDFs may represent an optimal (e.g.,capacity achieving) compression that is zero-delay. For example, thenonanticipative RDFs of the present disclosure may represent a schemefor compressing information that only depends on previously communicatedsymbols (e.g., is causal), not on all communicated symbols. Thenonanticipative RDFs may also represent compression of bothtime-varying, or “nonstationary,” and stationary sources of information.

Still further, methods for designing encoding schemes may utilize JointSource Channel Coding (JSCC) to generate encoding schemes forsimultaneous compression and channel coding operating with zero-delay orin real-time. As opposed to having codes for compression and codes forchannel coding of compressed information, encoding schemes of thepresent disclosure may utilize a single encoding scheme, generated viaJSCC design, that provides both compression and channel coding withzero-delay. JSCC methods to design such encoding scheme may utilizecharacterized capacities, necessary and sufficient conditions ofinformation lossless encoders, and nonanticipative RDFs as discussedabove.

A. System Overview

FIG. 1 is a block diagram of an example communication system 100including at least some components that may be configured according toand/or utilize the methods discussed herein. In the example system 100,a source 102 generates a message (e.g., including any suitableinformation). An encoder 104 encodes the generated message, and theencoded message is transmitted as a signal over a channel 106. A decoder108 decodes and the transmitted signal to produce an output.

The source 102 may include one or more stationary or mobile computing orcommunication devices, in an implementation. For example, the source 102may include a laptop, desktop, tablet, or other suitable computergenerating messages including digital data, such as digital datarepresenting pictures, text, audio, videos, etc. In other examples, thesource 102 may be a mobile phone, smartphone, land line phone, or othersuitable phone generating messages including signals representative ofaudio or text messages to one or more other suitable phones orcommunication devices. Generally, the source 102 may include any numberof devices or components of devices generating messages to be encoded bythe encoder 104 and transmitted over the channel 106. Further details ofan example computing device, which may be implemented as the source 102,are discussed with reference to FIG. 15.

In particular, the example source 102 generates messages includingsource symbols x^(n)

{x₀, x₁, . . . , x_(n)}, x_(j)εX_(j), where j=0, 1, . . . , n, accordingto a source distribution P_(X) _(n) (dx^(n)). As discussed above, thesesymbols may represent any suitable information, such as photos, audio,video, etc. generated by or forwarded by the source 102.

The encoder 104 may include one or more devices, circuits, modules,engines, and/or routines communicatively and/or operatively coupled tothe source 102. For example, the encoder 104 may be communicativelyand/or operatively coupled to the source 102 via a bus of a computingdevice (e.g., a computing device implemented as the source 102), such asan Industry Standard Architecture (ISA) bus, a Micro ChannelArchitecture (MCA) bus, Enhanced ISA (EISA) bus, Video ElectronicsStandards Association (VESA) local bus, a Peripheral ComponentInterconnect (PCI) bus or a Mezzanine bus, or the Peripheral ComponentInterconnect Express (PCI-E) bus, or the encoder 104 may becommunicatively and/or operatively coupled to the source 102 via one ormore wires or cables, such as Ethernet cables, ribbon cables, coaxialcables, etc. In any event, the encoder 104 receives messages (e.g.,source symbols x^(n)) from the source 102.

The example encoder 104 encodes the received symbols x^(n) from thesource 102 into channel input symbols a^(n)

{a₀, a₁, . . . , a_(n)}, a_(j)εX_(j), where j=0, 1, . . . , n. Thesechannel input symbols have “induced” channel input distributions {P_(A)_(i) _(|B) _(i−1) _(,A) _(i−1) (da_(i)|b^(i−1),a^(i−1)):i=1, 2, . . . ,n} if there is feedback or {P_(A) _(i) _(|A) _(i−1)(da_(i)|a^(i−1)):i=1, 2, . . . , n} if there is no feedback encoding.Further details of encoders and the encoding of the generated sourcesymbols is discussed with reference to FIGS. 6, 7, and 8.

The channel 106 may include any number of wires, cables, wirelesstransceivers, etc., such as Ethernet cables, configured to facilitate atransmission of the encoded source symbols from the encoder 104 to thedecoder 108. Further details of various types of channels that may beimplemented as the channel 106 are discussed in section B.2. entitled“Characterizing Channels.” The example channel 106 is noisy with memorydefined by a sequence of conditional distributions, {P_(B) _(i) _(|B)_(i−1) _(,A) _(i) (db_(i)|b^(i−1),a^(i)):i=1, 2, . . . , n}. The channelproduces a channel output b^(n)

{b₀, b₁, . . . , b_(n)}, b_(j)εB_(j), where j=0, 1, . . . , n, whichchannel output is input to the decoder 108.

The decoder 108 may include one or more devices, modules, engines,and/or routines communicatively and/or operatively coupled to one ormore computing devices, phones, etc. to receive decoded informationtransmitted over the channel 106. The decoder 108 receives the channeloutput, B_(j), and decodes the channel output to produce the decodedoutput Y_(j). The decoder 108 may or may not use past channel outputsand decoder outputs to produce Y_(j). For example, when the source 102is a originating phone, the decoder 108 may be communicatively and/oroperatively coupled to terminating phone to decode audio signals sentfrom the originating phone to the terminating phone.

In some implementations, the source 102 and/or encoder 104 may know(e.g., receive transmissions indicative of and/or store informationindicative of) all, or at least some outputs (B_(j)) from the channel106, before generating, encoding, and/or transmitting a next signal.This functionality may be referred to herein as “feedback.” In otherwords, a channel or communication system with feedback may be a systemin which a source and/or encoder receive indications of, or otherwise“know,” all or at least some previous outputs from the channel beforesending a subsequent signal over the channel.

B. Characterization of Capacities and Identification of CapacityAchieving Channel Input Distributions and Capacity Achieving Encoders

The methods described below allow the “finite block length” feedbackcapacity for channels, such as the channel 106, with memory and with andwithout transmission cost constraints on the encoders to becharacterized, in some implementations. “Feedback capacity” may refer toa capacity of a channel with feedback, as discussed above, and “finiteblock length” may refer to a feedback capacity defined in terms of afinite number of transmissions or a finite time period such that thefeedback capacity can be defined without taking a limit of infinitetransmissions. In characterizing the finite block length feedbackcapacity, operators of communications systems may also identify thecorresponding capacity achieving channel input distributions and thecapacity achieving encoders. Operators may also utilize thecharacterization to determine whether feedback increases the finiteblock length capacity of channels without feedback, to characterize thefinite block length capacity without feedback for general channels withmemory and with or without transmission cost constraints on theencoders, and to identify the corresponding capacity achieving channelinput distributions and capacity achieving encoders. By extension, themethods described below may allow a per unit time limiting version ofthe finite block length feedback capacity to determine the capacity andcapacity achieving channel input distributions.

In some implementations, the characterization of the finite block lengthfeedback capacity may facilitate the design of optimal encoding anddecoding schemes that both reduce the complexity of communicationsystems and operate optimally. Operating optimally may include anoptimal operation in terms of the overall number of processing elements(e.g., CPUs) required to process transmissions and/or the number ofmemory elements and steps required to encoder and decode messages. Suchoptimal encoding and decoding schemes may require small processingdelays and short code lengths in comparison to encoding and decodingschemes designed based on an assumption of a channel without memory orbased on a separate treatment of source codes and channel codes.

Architectures and methodologies discussed herein provide encoders anddecoders based on channel characteristics, transmission costrequirements, and the characteristics of messages generated by a source(e.g., the source 102). Although, the characterizations, encoders,distributions, etc. discussed below are described, by way of example,with reference to the example communication system 100, which system 100is a point-to-point communication system, characterizations, encoders,distributions, etc. may be applied to or implemented in systems otherthan point-to-point communication systems. For example, thecharacterizations, encoders, distributions, etc. of the presentdisclosure may be implemented in multi-user and network communicationsystems by repeating the procedures described below (for each user, foreach node of a network, etc.).

B.1. Characterizing Channels

A determination and characterization of the capacity of a channel, suchas the channel 106, may first include a characterization or descriptionof the channel. The techniques of the present disclosure characterize aplurality of channel types with memory and with or without feedbackallowing the capacity of these channels to be determined along withcapacity achieving encoders and distributions, in some implementations.Generally, the characterization of a channel includes: (i) an “alphabet”defining the signals spaces (e.g., inputs and outputs) of acommunication system; (ii) a “transmission cost function” defining adependence of a rate of information transfer over the channel on theamount of energy or, more generally, any cost imposed on symbolstransferred over the channel; and (iii) a model for the channeldescribed by, for example, conditional distributions or stochasticdiscrete-time recursions in linear, nonlinear and state space form.

Channel input and output alphabets may, in some implementations, becomplete separable metric spaces such as functions spaces of finiteenergy signals. These metric spaces may include, by way of example andwithout limitation, continuous alphabets, countable alphabets, andfinite alphabets, such as real-valued R^(p)-dimensional and/orcomplex-valued C^(p)-dimensional alphabets for channel output alphabetsand real-valued R^(p)-dimensional and/or complex-valuedC^(q)-dimensional alphabets for channel input alphabets, finite energyor power signals, and bounded energy signals defined on metric spaces.

Transmission cost functions may include, by way of example, nonlinearfunctions of past and present channel inputs and past channel outputs orconditional distributions. The transmission cost functions define a costof transmitting certain symbols over a channel, which cost is generallynot the same for all symbols. For example, an energy, or other cost,required to transmit one symbol may differ from an energy required totransmit another symbol. Operators of a communication system maydetermine transmission cost functions, to utilize in the methodsdiscussed below, by sending certain information (e.g., diagnostic orconfiguration information) over a channel and measuring an output of thechannel.

Similarly, by sending certain information over a channel, operators of acommunication system may determine a channel model. The channel modelmay model the behavior of the channel including a dependence (e.g.,non-linear dependence) of transmission on past channel inputs, outputsand noise processes of memory. By way of example, the channel model maybe a linear channel models with arbitrary memory, a Gaussian channelmodel, a state space model, or an arbitrary conditional distributiondefined on countable, finite channel input and output alphabets, orcontinuous alphabet spaces. For waveform signals (e.g., continuousspeech signals) transferred over a channel, the channel model may be anon-linear differential equation, and, for quantized signals (e.g.,zeros and ones) transferred over a channel, the channel model may be aconditional distribution.

Models of channels with memory and feedback may include, by way ofexample, the following channel conditional distributions definingcertain classes of channels:

Class A.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a _(i)), i=0, . . . ,n.  1.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(|B) _(i−1) _(,A) _(i−L) _(i) (db _(i) |b ^(i−1) ,a _(i−L) ^(i)),i=0, . . . , n.  2.

Class B.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(,B) _(i−1) _(,A) _(i) (db _(i) |b _(i−1) ,a _(i)), i=0, . . . ,n.  1.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(,B) _(i−M) _(i−1) _(,A) _(i) (db _(i) |b _(i−M) ^(i−1) ,a _(i)),i=0, . . . ,n.  2.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(,B) _(i−M) _(i−1) _(,A) _(i) (db _(i) |b _(i−M) ^(i−1) ,a _(i)),i=0, . . . ,n.  3.

Class C.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(,B) _(i−2) _(i−1) _(,A) _(i−1) _(i) (db _(i) |b _(i−2) ^(i−1) ,a_(i−1) ^(i)), i=0, . . . ,n.  1.

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))=P _(B)_(i) _(,B) _(i−M) _(i−1) _(,A) _(i−L) _(i) (db _(i) |b _(i−M) ^(i−1) ,a_(i−L) ^(i)), i=0, . . . ,n.  2.

where {L,M} are nonnegative finite integers. The above example classesof channel conditional distributions may be induced by nonlinear channelmodels and linear time-varying Autoregressive models or by linear andnonlinear channel models expressed in state space form.

Classes of transmission cost functions may include, by way of example,the following:

Class A.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(A.1)(a _(i) ,b ^(i−1)),i=0, . . . n,  1.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(A.2)(a _(i−N) ^(i) ,b^(i−1)), i=0, . . . n,  2.

Class B.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(B.1)(a _(i) ,b _(i−1)),i=0, . . . n,  1.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(B.2)(a _(i) ,b _(i−K)^(i−1)), i=0, . . . n,  2.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(B.3)(a ^(i) ,b _(i−K)^(i−1)), i=0, . . . n,  3.

Class C.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(C.1)(a _(i−1) ^(i) ,b_(i−2) ^(i−1)), i=0, . . . n,  1.

γ_(i)(T ^(n) a ^(i) ,T ^(n) b ^(i−1))=γ_(i) ^(C.2)(a _(i−1) ^(i) ,b_(i−K) ^(i−1)), i=0, . . . n,  2.

where {N,K} are nonnegative finite integers.

If M=0 in the above example classes of channels and transmission costfunctions, then

P _(B) _(i) _(|B) _(i−M) _(i−1) _(,Ā) _(i) (db _(i) |b _(i−M) ^(i−1) ,ā^(i))|_(M=0) ≡P _(B) _(i) _(|Ā) _(i) (db _(i) |ā ^(i)) for Ā ^(i) ε{A^(i) ,A _(i−L) ^(i) }, i=0, 1, . . . ,n.

Hence, for M=L=0,

P _(B) _(i) _(|) _(i−M) _(i−1) _(,A) _(i−L) _(i) (db _(i) |b _(i−M)^(i−1) ,a _(i−L) ^(i))|_(M=L=0) ≡P _(B) _(i) _(|A) _(i) (db _(i) |a_(i)), i=0,1, . . . ,n

is a memoryless channel. Similarly, if K=0 then

γ_(i) ^(C.2)(a _(i−N) ^(i) ,b _(i−K) ^(i−1))|_(K=0)≡γ_(i) ^(C.2)(a_(i−N) ^(i)), i=0, . . . ,n.

B.2. Capacity Without Feedback

For further clarification, varous notations and expressions related tocapacities without feedback are presented below. The problem of thecapacity of channels with memory and without feedback includes amaximization of mutual information between channel input and outputsequences over an admissible set of channel input distributions. Themutual information may be:

${I\left( {A^{n};B^{n}} \right)}\; \overset{\Delta}{=}{E_{A^{n},B^{n}}{\left\{ {\log \left( {\frac{{P_{A^{n},B^{n}}\left( {\cdot {, \cdot}} \right)}}{\left( {{P_{A^{n}}( \cdot )} \times {P_{B^{n}}( \cdot )}} \right)}\left( {A^{n},B^{n}} \right)} \right)} \right\}.}}$

The admissible set of channel input distributions may be:

_([0,n]) ^(noFB)

{P _(A) _(i) _(|A) _(i−1) (da _(i) |a ^(i−1)):i=0,1, . . . ,n}

and the maximization of mutual information may be expressed as:

${C_{A^{\infty};B^{\infty}}^{noFB}\underset{n->\infty}{\lim \; \sup}\frac{1}{n + 1}C_{A^{n};B^{n}}^{noFB}},{C_{A^{n};B^{n}}^{noFB} = {{I\left( {A^{n};B^{n}} \right)}}}$

where P_(A) _(n) _(,B) _(n) , (da^(n), db^(n))=P_(B) _(n) _(,|A) _(n)(db^(n)|a^(n))

P_(A) _(n) (da^(n)) is referred to herein as “the joint distribution,”P_(B) _(n) (db^(n))=∫_(A) _(n) P_(B) _(n) _(|A) _(n) (db^(n)|a^(n))

P_(A) _(n) (da^(n)) is referred to herein as “the channel outputdistribution,” P_(B) _(n) _(|A) _(n) (db^(n),a^(n))=

_(i=0) ^(n)P_(B) _(i) _(|B) _(i−1) _(,A) _(i)(db_(i)|b^(i−1),a^(i))−a.a.(a^(n),b^(n)) is referred to herein as “thechannel distribution because the encoder does not utilize feedback, andE_(A) _(n) _(,B) _(n) {•} denotes the expectation over the jointdistribution of {A^(n),B^(n)}.

Further, the extremum problems of capacity of channels with memorywithout feedback, when transmission cost is imposed on channel inputdistributions may be written, by way of example, as:

${{C_{A^{\infty};B^{\infty}}^{noFB}(\kappa)}\mspace{14mu} \mspace{14mu} \limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}C_{A^{n};B^{n}}^{noFB}},{{C_{A^{n};B^{n}}^{noFB}(\kappa)} = {{I\left( {A^{n};B^{n}} \right)}}}$where [ 0 , n ] noFB  ( κ )     { P A i | A i - 1  ( da i | a i -1 ) , i = 0 , … , n  :  1 n + 1  E  { ∑ i = 0 n   γ i  { T n  Ai , T n  B n - 1 ) } ≤ κ ) , κ ∈ [ 0 , ∞ ) andT^(n)a^(i) ⊆ {a₀, a₁, …, a_(i)}, T^(n)b^(i − 1) ⊆ {b₀, b₁, …, b_(i − 1)}, i = 0, 1, …, n

In this notation, C_(A) _(n) _(;B) _(n) ^(noFB); is referred to hereinas the finite block length capacity without feedback, C_(A) _(∞) _(;B)_(∞) ^(noFB) is referred to herein as the capacity without feedback,C_(A) _(n) _(;B) _(n) ^(noFB)(

) is referred to herein as the finite block length capacity withoutfeedback with transmission cost, and C_(A) _(∞) _(;B) _(∞) ^(noFB)(

) is referred to herein as the capacity without feedback withtransmission cost.

B.3. Capacity with Feedback

For still further clarification, various notations and expressionsrelated to capacities with feedback are presented below. The problem ofthe capacity of channels with memory and with feedback includes amaximization of directed information from channel input and outputsequences over an admissible set of channel input distributions. Thedirected information may be written as:

${{{I\left( A^{n}\rightarrow B^{n} \right)}\mspace{14mu} \mspace{14mu} {\sum\limits_{i = 0}^{n}\; {I\left( {A^{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}}} = {\sum\limits_{i = 0}^{n}\; {E_{A^{i},B^{i}}\left\{ {\log \left( {\frac{{dP}_{{B_{i}|B^{i - 1}},A^{i}}\left( {{\cdot \left| B^{i - 1} \right.},A^{i}} \right)}{{dP}_{B_{i}|B^{i - 1}}\left( {\cdot \left| B^{i - 1} \right.} \right)}\left( B_{i} \right)} \right)} \right\}}}},$

and the admissible set of channel input distributions may be expressedas:

_([0,n])

{P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) (da _(i) |a^(i−1),b^(i−1)):i=0,1, . . . ,n}

The maximization of directed information may be expressed as:

${C_{A^{\infty}\rightarrow B^{\infty}}^{FB}\mspace{14mu} \mspace{14mu} \limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}C_{A^{n}\rightarrow B^{n}}^{FB}},{C_{A^{n}\rightarrow B^{n}}^{FB}\mspace{14mu} \mspace{14mu} {I\left( A^{n}\rightarrow B^{n} \right)}}$

For each i=0, 1, . . . , n, {P_(A) _(i) _(B) _(i) (da^(i),db^(i)),P_(B)_(i) _(|B) _(i−1) (db_(i)|b^(i−1))} are referred to herein as the jointand conditional distributions induced by the channel and the channelinput distributions, and E_(A) _(i) _(,B) _(i) {•} denotes theexpectation over the joint distribution of {A^(i),B^(i)}.

Further, the extremum problems of capacity of channels with memory andwith feedback, when transmission cost is imposed on channel inputdistributions may be written, by way of example, as:

${{C_{A^{\infty}\rightarrow B^{\infty}}^{FB}(\kappa)}\mspace{14mu} \mspace{14mu} \limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}{C_{A^{n}\rightarrow B^{n}}^{FB}(\kappa)}},{C_{A^{n}\rightarrow B^{n}}^{FB}\mspace{14mu} \mspace{14mu} {I\left( A^{n}\rightarrow B^{n} \right)}}$where [ 0 , n ]  ( κ )     { P A i | A i - 1 , B i - 1 , i = 1 , …, n  :  1 n + 1  E  { ∑ i = 0 n   γ i  ( T n  A i , T n  B i -1 ) } ≤ } .

In this notation, C_(A) _(n) _(→B) _(n) ^(FB) is referred to herein asthe finite block length feedback capacity, C_(A) _(∞) _(→B) _(∞) ^(FB)is referred to herein as the feedback capacity, C_(A) _(n) _(→B) _(n)^(FB)(

) is referred to herein as the finite block length feedback capacitywith transmission cost, and C_(A) _(∞) _(→B) _(∞) ^(FB)(

) is referred to herein as the feedback capacity with transmission cost.

B.4. Characterization of Capacities and Identification of OptimalChannel Input Distributions Overview

FIG. 2 illustrates an example process 200 for characterizing capacitiesand identifying optimal channel input distributions. The capacitiescharacterized via the process 200 may include finite block lengthfeedback capacities, feedback capacities, finite block length feedbackcapacities with transmission cost, feedback capacities with transmissioncost, finite block length capacities without feedback, capacitieswithout feedback, finite block length capacities without feedback andwith transmission cost, and capacities without feedback and withtransmission cost. These types of capacities are further described insections B.2. and B.3. entitled “Capacity Without Feedback” and“Capacity With Feedback,” respectively. A computing device, such as thecomputing device illustrated in detail in FIG. 15, may be speciallyand/or specifically configured (e.g., by one or more algorithms,routines, modules, or engines) to implement at least a portion of theprocess 200. Further, components of the example communication system100, such as the encoder 104, may be configured according to the outputof the process 200.

In the process 200, a channel model 202 and a transmission cost function204 are input into a two-step characterization procedure 206. Thechannel model 202 and transmission cost function 204 may describe one ofthe plurality of channel classes and transmission cost function classesdiscussed in section B.1. entitled “Characterizing Channels.” Inparticular, the channel model 202 may describe the noise of a channeland dependencies amongst channel inputs and channel output via one ormore conditional distributions and/or one or more non-lineardifferential equations. The transmission cost function 204 may define acost of sending each of a plurality of symbols (e.g., defined in analphabet) over the channel described by the channel model 202.

The two-step characterization procedure 206 may, based on the channelmodel 202 and a transmission cost function 204 produce an optimalchannel input distribution 208 and a finite block length feedbackcapacity 210, in the scenario illustrated in FIG. 2. Although the finiteblock length feedback capacity 210 is illustrated in FIG. 2, thetwo-step characterization procedure may characterize other capacities,such as C_(A) _(n) _(→B) _(n) ^(FB)(

), C_(A) _(∞) _(→B) _(∞) ^(FB)(

), C_(A) _(n) _(;B) _(n) ^(noFB), or C_(A) _(n) _(;B) _(n) ^(noFB)(

) depending on the class of channel described by the channel model 202and the class of the transmission cost function 204. Further details ofan implementation of the two-step characterization procedure 206 arediscussed with reference to FIGS. 3, 4, and 5. Generally, a first stepof the two-step characterization procedure 206 may include utilizingstochastic optimal control to identify subset of channel inputdistributions, and a second step of the two-step characterizationprocedure 206 include utilize a variational equality to further narrowthe subsets of channel input distributions to the optimal channel inputdistribution 208.

In some implementations, the process 200 also includes a per unit timeconversion 212 of the optimal channel input distribution 208 and thefinite block length feedback capacity 210. This per unit time conversion212 converts the finite block length feedback capacity 210 to a feedbackcapacity 213, which feedback capacity 214 describes the capacity of thechannel as a number of transmissions approaches infinity. The per unittime conversion 212 additionally generates a feedback capacity achievinginput distribution 216 corresponding to the feedback capacity 214.Although the feedback capacity 214 is illustrated in FIG. 2, the perunit time conversion 212 may generate other capacities, such as C_(A)_(∞) _(→B) _(∞) ^(FB)(

), C_(A) _(∞) _(;B) _(∞) ^(noFB), or C_(A) _(∞) _(;B) _(∞) ^(noFB)(

), depending on the class of channel described by the channel model 202and the class of the transmission cost function 204. Further details ofimplementations of the per unit time conversion 212 are discussed below.

B.5. Two-Step Procedure for Characterizing Capacity and IdentifyingOptimal Input Distributions

As discussed with reference to FIG. 2, the two-step characterizationprocedure 206 may, based on a channel model and a transmission costfunction, produce the optimal channel input distribution 208 and aspecific characterization of (e.g., a formula of) a capacity, such asthe finite block length feedback capacity 210. In an implementation,step one of the two-step characterization procedure 206 utilizestechniques from stochastic optimal control with relaxed or randomizedstrategies (e.g., conditional distributions), and step two of thetwo-step characterization procedure 206 may utilizes a variationalequality of directed information, as described further below.

Given a specific channel distribution and a specific transmission costfunction from the classes described in section B.1. entitled“Characterizing Channels,” step one of the two-step characterizationprocedure 206 may include applying stochastic optimal control to show acertain joint process which generates the information structure of thechannel input distribution at each time is an extended Markov process.For an example, for the channel input distribution (at every time)describing a channel with memory:

P_(A) _(i) _(|A) _(i−1) _(,B) _(i−1) (da_(i)|a^(i−1),b^(i−1))

the joint process

_(i) ^(P) ⊂{a ^(i−1) ,b ^(i−1)}, for i=0,1, . . . ,n,

is an extended Markov process, with respect to a smaller informationstructure

_(i) ^(P) ⊂{a ^(i−1) ,b ^(i−1)},

for i=0, 1, . . . , n. Based on this joint process, the optimal channelinput distribution corresponding to C_(A) _(n) _(→B) _(n) ^(FB) andC_(A) _(n) _(→B) _(n) ^(FB) (

) is included in specific subsets of input distributions:

_([0,n]) ⊂{P _(i)(da _(i) |a ^(i−1) ,b ^(i−1)): i=0, . . . ,n}ε

_([0,n]) and

_([0,n])(

)⊂

_([0,n])(

).

Thus, step one of the two-step characterization procedure 206 narrowsall possible channel input distributions to specific subsets of inputdistributions, where the subsets of input distributions include theoptimal channel input distribution.

Further, in some implementations of the step one of the two-stepcharacterization procedure 206, the first step may include acharacterization of capacity (e.g., finite block length feedbackcapacity) corresponding to the determined subsets of inputdistributions. In particular, step one of the two-step characterizationprocedure 206 may include generating a formula or other expressionrepresenting a capacity corresponding to the determined subsets of inputdistributions. For the channel input distribution describing a channelwith memory (discussed above), the formulas for finite block lengthfeedback capacity and feedback capacity with and without transmissioncost are:

${{\overset{\_}{C}}_{A^{\infty}\rightarrow B^{\infty}}^{FB} = {\limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}{\overset{\_}{C}}_{A^{n}\rightarrow B^{n}}^{FB}}},{{\overset{\_}{C}}_{A^{n}\rightarrow B^{n}}^{FB}\mspace{14mu} \mspace{14mu} {I\left( A^{n}\rightarrow B^{n} \right)}}$${{{\overset{\_}{C}}_{A^{\infty}\rightarrow B^{\infty}}^{FB}(\kappa)} = {\limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}{{\overset{\_}{C}}_{A^{n}\rightarrow B^{n}}^{FB}(\kappa)}}},{{{\overset{\_}{C}}_{A^{n}\rightarrow B^{n}}^{FB}(\kappa)}\mspace{14mu} \mspace{14mu} {I\left( A^{n}\rightarrow B^{n} \right)}}$

where I(A^(n)→B^(n))=I( π,P) is a specific functional of the channelinput distribution πε

_([0,n]) and the channel conditional distribution Pε{Class A, Class B,Class C} as described in section B.1. entitled “CharacterizingChannels.”

Step two of the two-step characterization procedure 206 includesapplying a variational equality of directed information to the subsetsof input distributions determined in step one. In this manner, step twoof the two-step characterization procedure 206 further narrows thedetermined subsets of input distributions, in some implementations. Inparticular, for the example above involving a channel with memory, anupper bound is achievable over the determined subsets of inputdistributions is expressed as:

_([0,n])⊂

_([0,n]) and

_([0,n])(

)⊂

_([0,n])(

).

Based on such an upper bound or further narrowing of inputdistributions, step two may further include determining a refinedcapacity based on the narrowed input distributions. For the exampleabove of a channel with memory, the characterization of finite blocklength feedback capacity and feedback capacity with and withouttransmission cost, obtained from step two is:

${{\overset{{^\circ}}{C}}_{A^{\infty}\rightarrow B^{\infty}}^{FB} = {\limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}{\overset{{^\circ}}{C}}_{A^{n}\rightarrow B^{n}}^{FB}}},{{\overset{{^\circ}}{C}}_{A^{n}\rightarrow B^{n}}^{FB}\mspace{14mu} \mspace{14mu} {I\left( A^{n}\rightarrow B^{n} \right)}}$${{{\overset{{^\circ}}{C}}_{A^{\infty}\rightarrow B^{\infty}}^{FB}(\kappa)} = {\limsup\limits_{n\rightarrow\infty}\frac{1}{n + 1}{{\overset{{^\circ}}{C}}_{A^{n}\rightarrow B^{n}}^{FB}(\kappa)}}},{{{\overset{{^\circ}}{C}}_{A^{n}\rightarrow B^{n}}^{FB}(\kappa)}\mspace{14mu} \mspace{14mu} {I\left( A^{n}\rightarrow B^{n} \right)}}$

where I(A^(n)→B^(n))=I({dot over (π)},P) is a specific functional of thechannel input distribution {dot over (π)}ε

_([0,n]) and the channel conditional distribution Pε{Class A, Class B,Class C} as described in section B.1. entitled “CharacterizingChannels.”

FIG. 3 is a flow diagram of an example method 300 for characterizing acapacity and optimal channel input distribution with a two-stepprocedure, such as the two-step procedure 206. A computing device, suchas the computing device illustrated in detail in FIG. 15, may bespecially and/or specifically configured to implement at least a portionof the method 300. Further, in some implementations, a suitablecombination of a computing device and an operator of a communicationsystem, such as the communication system 100, may implement the method300.

In the method 300, a computing device or operator receives a channelmodel and transmission cost function (block 302). The received channelmodel may include a channel conditional distribution, such as one of thechannel conditional distributions Pε{Class A, Class B, Class C} asdescribed in section B.1. entitled “Characterizing Channels,” or thechannel model may include other types of functions, such as non-lineardifferential equations, for channels transmitting continuous signals(e.g., speech signals). Also, the transmission cost function may includeone of the transmission cost functions as described in section B.1.entitled “Characterizing Channels.”

The computing device and/or operator then applies step one of a two-stepprocedure (block 304). In step one, the computing device and/or operatorutilizes stochastic optimal control to determine subsets of inputdistributions and a corresponding capacity formula based on the receivedchannel model and transmission cost function. The determine subsets ofinput distributions may include the optimal channel input distributions.That is the determined subsets of input distributions may include thechannel input distribution that achieves the corresponding capacity ofthe channel. Further details of step one of the two-step procedure arediscussed with reference to FIG. 4.

For certain channel models and transmission cost functions, step one maybe sufficient to identify the information structures of channel inputdistributions and characterize a capacity (e.g., the finite block lengthfeedback capacity and the feedback capacity with and withouttransmission cost) of the channel. In other cases, step one may only bean intermediate step before step two of the two-step procedure. Morespecifically, step one may be sufficient for channel conditionaldistributions which only depend on all previous channel outputs andcosts (e.g., not a current and/or previous channel input). As such, themethod 300 includes determining if the channel, described by thereceived channel model, is dependent on more than the previous channeloutputs and costs (block 306). If the channel is only dependent on allprevious channel outputs and costs, the flow continues to block 310. Ifthe channel is dependent on more than the previous channel outputs andcosts (e.g., channel inputs), the flow continues to block 308.

At block 308, the computing device and/or operator applies step two ofthe two-step procedure. In step two, the computing device and/oroperator utilizes a variational equality to further narrow the subsetsof input distributions determined at block 304. Further, the computingdevice and/or operator determines a refined capacity formula, such as afinite block length feedback capacity, based on the further narrowedsubsets of input distributions. In some implementations, the furthernarrowing of the input distributions based on the variational equalityincludes identifying a single optimal channel input distribution, and,in other implementations, the further narrowing of the inputdistributions based on the variational equality includes identifying asmaller subset of channel input distribution including fewerdistributions than the subsets determined at block 304.

The computing device and/or operator solves the capacity formula(determined at block 304) and/or refined capacity formula (determined atblock 308) to determine the capacity of the channel (block 310). Thecapacity formulas solved at block 308 may include maximizations or otherextremum problems, as further discussed above. The computing deviceand/or operator may utilize various techniques to solve thesemaximizations or extremum problems including, by way of example andwithout limitation, dynamic programming and/or stochastic calculus ofvariations (e.g., the stochastic maximum principle).

FIG. 4 is a flow diagram of an example method 400 for determiningsubsets of channel input distributions and a corresponding capacitybased on step one of the two-step procedure (discussed with reference toFIG. 3). A computing device, such as the computing device illustrated indetail in FIG. 15, may be specially and/or specifically configured toimplement at least a portion of the method 400. Further, in someimplementations, a suitable combination of a computing device and anoperator of a communication system, such as the communication system100, may implement the method 400.

In the method 400, a computing device and/or operator determines, for aspecific channel described by a channel model, a process which generatesa structure of the channel input distribution (block 402). The computingdevice and/or operator may utilize techniques from stochastic optimalcontrol to optimize a “pay-off” (e.g., a finite block length capacity ofthe specific channel) over all possible processes (e.g., distributions).As discussed above for an example channel with memory, the determinationof the process may include demonstrating that a certain joint processwhich generates the information structure of the channel inputdistribution at each time is an extended Markov process.

The computing device and/or operator may then determine a smaller set ofprocesses (e.g., distributions) optimizing the capacity (block 404).This smaller set of distributions or processes includes the optimaldistribution or process that achieves the finite block length capacity,or “pay-off.” For example, the smaller set of distributions may includeonly some of a complete set of possible channel input distributions,where the smaller set, or subset, includes the optimal channel inputdistribution.

The computing device and/or operator also determines a capacity formulabased on the determined subset of process or distributions (block 406).As discussed further in sections B.2. and B.3. entitled “CapacityWithout Feedback” and “Capacity With Feedback,” respectively, formulasfor capacities may be expressed in terms of channel input distributions.Thus, upon determining subsets of channel input distributions at block404, the computing device and/or operator may generate a formula forcapacity, for the specific channel, based on the subsets of channelinput distributions.

FIG. 5 is a flow diagram of an example method 400 for narrowing subsetsof channel input distributions and determining a corresponding refinedcapacity based on step two of the two-step procedure (discussed withreference to FIG. 3). A computing device, such as the computing deviceillustrated in detail in FIG. 15, may be specially and/or specificallyconfigured to implement at least a portion of the method 500. Further,in some implementations, a suitable combination of a computing deviceand an operator of a communication system, such as the communicationsystem 100, may implement the method 500.

In the method 500, a computing device and/or operator applies avariational equality to subsets of input distributions (block 502). Forexample, the computing device and/or operator may apply the variationalequality of directed information further described in C. D. Charalambouset al., “Directed information on abstract spaces: Properties andvariational equalities,” http://arxiv.org/abs/1302.3971, submitted Feb.16, 2013. Such an application of a variation equality may generate anupper bound over the subsets of input distributions (block 504). Thatis, the application of the variational equality further narrows subsetsof input distributions (e.g., determined according to the example method400) based on techniques specific to information theory (e.g., directedinformation). The computing device and/or operator also determines arefined capacity formula based on the further narrowed subsets of inputdistributions (block 506).

B.6. Example Characterizations of Capacities and Identification ofChannel Input Distributions for Channels in Class A

By way of example, example finite block length feedback capacityformulas and input distributions, for example classes of channels,determined according the two-step procedure (described with reference toFIGS. 2, 3, 4, and 5) are presented below. The corresponding feedbackcapacities with and without transmission cost are limiting versions ofthe finite block length feedback capacities presented below.

B.6.1. Class A.1.

For an example channel condition distribution,

{P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a _(i)): i=0,1,. . . ,n}

the optimal channel input distribution C_(A) _(n) _(;B) _(n) ^(noFB) isincluded in the subset:

_([0,n]) ^(A.1)

{P _(A) _(i) _(|B) _(i−1) (da _(i) |b ^(i−1)): i=0, . . . ,n}⊂

_([0,n]).

As such, for each i=0, 1, n, the information structures of themaximizing channel input distribution (according to the example two-stepprocedure described in section B.5. entitiled “Two-Step Procedure forCharacterizing Capacity and Identifying Optimal Input Distributions”)is:

_(i) ^(P)

{b ^(i−1) }⊂{a ^(i−1) ,b ^(i−1)}.

The characterization of the finite block length feedback capacity isthen

${C_{A^{n}\rightarrow B^{n}}^{{FB},{A{.1}}}\mspace{14mu} \mspace{14mu} {\sum\limits_{i = 0}^{n}\; {\int{{\log \left( {\frac{{dP}_{{B_{i}|B^{i - 1}},A_{i}}\left( {{\cdot \left| b^{i - 1} \right.},a_{i}} \right)}{{dP}_{B_{i}|{B^{i - 1}{({\cdot {|b^{i - 1}}})}}}}\left( b_{i} \right)} \right)}{P_{B^{i},A_{i}}\left( {{db}^{i},{da}_{i}} \right)}}}}} = {{\sum\limits_{i = 0}^{n}\; {I\left( {A_{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}}}$whereP_(B^(i), A_(i))(db^(i), da_(i)) = P_(B_(i)|B^(i − 1), A_(i))(db_(i)|b^(i − 1), a_(i)) ⊗ P_(A_(i)[B^(i − 1))(da_(i)|b^(i − 1)) ⊗ P_(B^(i − 1))(db^(i − 1)), i = 0, 1, …, n.andP_(B_(i)|B^(i − 1))(db_(i)|b^(i − 1)) = ∫P_(B_(i)|B^(i − 1), A_(i))(db_(i)|b^(i − 1), a_(i)) ⊗ P_(A_(i)|B^(i − 1))(da_(i)|b^(i − 1)), i = 0, …, 1, n.

If a transmission cost, such as

γ_(i) ^(A.1)(a _(i) ,b ^(i−1)), γ_(i) ^(B.1)(a _(i) ,b _(i−1)), γ_(i)^(B.2)(a _(i) ,b _(i−K) ^(i−1)), i=0,1, . . . ,n

is imposed, then the example characterization of the finite block lengthfeedback capacity is:

${C_{A^{n}\rightarrow B^{n}}^{{FB},{A{.1}}}(\kappa)} = {{\sum\limits_{i = 0}^{n}\; {{I\left( {A_{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}.}}}$

B.6.1.1. Example Non-Linear and Linear Channel Models

Channel distribution of example class A.1 (as defined in section B.1.entitled “Characterizing Channels”) may include one or both ofdistributions defined on finite and countable alphabet space anddistributions defined on continuous alphabet spaces, which distributionsmay be induced by the models described below.

Let

_(i)ℝ^(q), _(i)ℝ^(p), V_(i)ℝ^(r), h_(i) : ^(i − 1) × _(i) × _(i) ↦ _(i), i = 0, 1, …  , n

where {h_(i)(•,•,•):i=0, 1, . . . } are measurable functions, and{V_(i):i=0, 1, . . . } is a random channel noise process with jointdistribution:

P_(v) _(n) (dv^(n)) on

(

^(n)).

A recursive expression as follows may define an example nonlinearchannel model, for a channel in the example class A.1, with a continuousalphabet:

B _(i) =h _(i)(B ^(i−1) ,A _(i) ,V _(i)), i=0, . . . ,n

where transmission in the example model begins at time i=0, and theinitial data

B ⁻¹

b _(−∞) ⁻¹ ,A ₀

a ₀ ,V ⁻¹

v ⁻¹

are specified according to the covention utilized in the model. Forexample, this data may be taken to be the null set of data or anyavailable data prior to transmission time iε{−∞, . . . , −2, −1}.

If the channel noise process {V_(i):i=0, 1, . . . } is an independentsequence (e.g., non-necessarily stationary), then the above recursiveexpression for the example nonlinear channel model may be utilized instep one of the two-step process with the channel probabilitydistribution

$\begin{matrix}{{{{\mathbb{P}}\left\{ {{\left. {B_{i} \in A} \middle| B^{i - 1} \right. = b^{i - 1}},{A^{i} = a^{i}}} \right\}} = {{\mathbb{P}}\left\{ {{V_{i}\text{:}{b_{i}\left( {b^{i - 1},a_{i},V_{i}} \right)}} \in A} \right\}}},{\Gamma \in {\left( _{i} \right)}}} \\{{{= {Q_{i}\left( {\left. \Gamma \middle| b^{i - 1} \right.,a_{i}} \right)}},{i = 0},1,\ldots,{n.}}}\end{matrix}$

Another recursive expression as follows may define an example linearchannel model, for a channel in the example class A.1:

${B_{i} = {{- {\sum\limits_{j = 0}^{i - 1}\; {C_{i,j}B_{j}}}} + {A_{i,i}A_{i}} + V_{i}}},{i = 0},\ldots,n$

where, for each i=0, 1, . . . n, the coefficients {C_(i,j), D_(i,i):i=0,1, . . . n, j=0, 1, . . . , i−1} are real-valued matrices withdimensions p by p and p by q, respectively (e.g., with {p,q} beingpositive integers).

With such a linear model, the channel input distribution is obtainedfrom

$\begin{matrix}{{{\left\{ {{\left. {B_{i} \in A} \middle| B^{i - 1} \right. = b^{i - 1}},{A^{i} = a^{i}}} \right\}} = {\left\{ {{V_{i}\text{:}{h_{i}\left( {b^{i - 1},a_{i},V_{i}} \right)}} \in A} \right\}}},{\Gamma \in {\left( _{i} \right)}}} \\{{{= {Q_{i}\left( {\left. \Gamma \middle| b^{i - 1} \right.,a_{i}} \right)}},{i = 0},1,\ldots,{n.}}}\end{matrix}$

and the finite block length feedback capacity is characterized usingstep one of the two-step procedure, as further described in section B.5.entitled “Two-Step Procedure for Characterizing Capacity and IdentifyingOptimal Input Distributions.”

B.6.1.2. Example MIMO, ANonGN, and AGN Channel Models

Another example linear channel model, defined by linear dynamics, is:

${B_{i} = {{{- C_{1,{i - 1}}}B_{i - 1}} + {D_{i,1}A_{i}} + V_{i}}},{i = 0},\ldots,n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {E\left\{ {\gamma^{B{.1}}\left( {A_{i},B_{i - 1}} \right)} \right\}}}} \leq \kappa}$

An example channel noise is non-Gaussian, independent:

P _(V) _(n) (dv ^(n))=Π_(i=0) ^(n) P _(V) _(i) (dv _(i)),

with a zero mean and covariance matrix:

μ_(V) _(i)

E(V _(i))=0,K _(V) _(i)

E(V _(i) V _(i) ^(T)), i=0,1, . . . ,n,

and an average transmission cost

${\frac{1}{n + 1}E\left\{ \left| A_{i} \right| \right\}} \leq \kappa$

B.6.1.2.1. Example MIMO, ANonGN Channel with Memory

For an example channel model as described above with memory, a channelinput distribution is given by:

{B _(i) ≦b _(i) |B ^(i−1) =b ^(i−1) ,A ^(i) =a ^(i) }=

{V _(i) ≦b _(i)+Σ_(j=0) ^(i−1) C _(i,j) b _(j) −D _(i,j) a _(i) },i=0,1, . . . ,n.

and a characterization, according to step one of the two-step procedure,of the finite block length feedback capacity, with transmission cost, isgiven by:

${C_{A^{n}\rightarrow B^{n}}^{{FB},{{ANonGN} - {A{.1}}}}(\kappa)} = {{\sup\limits_{\{{{\pi_{i}{({{da}_{i}|b^{i - 1}})}},{i = 0},,{{n\text{:}\frac{1}{n + 1}\Sigma_{i = 0}^{n}E}|A_{i}|{\leq \kappa}}}\}}\left\{ {\sum\limits_{i = 0}^{n}\; {H\left( B_{i} \middle| B^{i - 1} \right)}} \right\}} - {H\left( V^{n} \right)}}$with${{\left\{ {\left. {B_{i} \leq b_{i}} \middle| B^{i - 1} \right. = b^{i - 1}} \right\}} = {\int_{b_{i}}{\left\{ {V_{i} \leq {b_{i} + {\sum\limits_{j = 0}^{l - 1}\; {C_{i,j}b_{j}}} - {D_{i,j}a_{i}}}} \right\} {\pi_{i}\left( {da}_{i} \middle| b^{i - 1} \right)}}}},{i = 0},1,\ldots,{n.}$

The information structure of the example channel input distribution

{π_(i)(da _(i) |b ^(i−1))≡P _(A) _(i) _(|B) _(i−1) (a _(i) |b ^(i−1)):i=0,1, . . . }

implies that a measurable function:

e_(i):  ^(i − 1) × _(i) → _(i), _(i)ℝ^(p), a_(i) = e_(i)(b^(i − 1), u_(i)), i = 0, 1, …  n,

exists, where {U_(i): i=0, 1, . . . n} is a p-dimensional random processwith distribution {P_(Ui)(du_(i)): i=0, 1, . . . n} such that

{U _(i) :e _(i)(b ^(i−1) ,U ^(i))εda _(i) }=P _(A) _(i) _(|B) _(i−1) (da_(i) |b ^(i−1)), i=0,1, . . . ,n.

Further, according to the definition of the linear channel model, B_(i),

${A_{i} = {e_{i}\left( {B^{i - 1},U_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n,{B_{i} = {{\sum\limits_{j = 0}^{i - 1}\; {C_{i,j}B_{j}}} + {D_{i,i}{e_{i}\left( {B^{i - 1},U_{i}} \right)}} + V_{i}}},{i = 0},1,\ldots \mspace{11mu},n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {E{{e_{i}\left( {B^{i - 1},U_{i}} \right)}}}}} \leq {\kappa.}}$

define a class of example admissible functions:

_(0,n) ^(AN) ^(on) ^(GN−A.1−IL)(

)

{e _(i)(b ^(i−1) ,u _(i)), i=0, . . . ,n: for a fixed b ^(i−1) thefunction e _(i)(b ^(i−1),•)

is one-to-one and onto

for i=0, . . . , n,

$\left. {\frac{1}{n + 1}\Sigma_{i = 0}^{n}E} \middle| {e_{i}\left( {B^{i - 1},U_{i}} \right)} \middle| {\leq \kappa} \right\}$

Another alternative characterization of the finite block length capacitywith transmission cost is:

C A n -> B n FB , AN   on   GN - A  .1 - IL  ( κ ) =  max { P U i} i = 0 n , { e i  ( · , · ) } i = 0 n ∈ 0 , n AN   on   GN - A .1 - IL  ∑ i = 0 n  H e  ( B i | B i - 1 ) - H  ( V n ) . ≡  max {P U i } i = 0 n , { e i  ( · , · ) } i = 0 n ∈ 0 , n AN   on   GN -A  .1 - IL  ∑ i = 0 n  I e  ( U i ; B i | B i - 1 ) where${{{\mathbb{P}}\left\{ {\left. {B_{i} \leq b_{i}} \middle| B^{i - 1} \right. = b^{i - 1}} \right\}} = {\int_{_{i}}{{\mathbb{P}}\left\{ {V_{i} \leq {b_{i} + {\sum\limits_{j = 0}^{i - 1}{C_{i,j}b_{j}}} - {D_{i,i}{e_{i}\left( {b^{i - 1},u_{i}} \right)}}}} \right\} {P_{U_{i}|B^{i - 1}}\left( {u_{i}} \middle| b^{i - 1} \right)}}}},{i = 0},1,\ldots \mspace{11mu},{n.}$

The characterization, or capacity formula, may be solved to find thecapacity. For example, a computing device may solve the maximizationwith dynamic programming or another suitable method as further discussedwith reference to FIG. 3.

B.6.1.2.2. Example AGN Channel with Memory

In another example case, the channel noise process is Gaussian:

V _(i) ˜N(0,K _(V) _(i) ), i=0,1, . . . ,n,

or approximately Gaussian. By the entropy maximizing property ofGaussian distributions, the finite block length feedback capacity isbounded from above by the inequality

H(B ^(n))≦H(B ^(g,n)),

where

B ^(g,n)

{B _(i) ^(g) :i=0,1, . . . ,n}

is Gaussian distributed. This upper bound may be achieved when

{π_(i)(da _(i) |b ^(i−1))≡P _(A) _(i) _(|B) _(i−1) ^(g)(a _(i) |b^(i−1)):i=0,1, . . . }

is conditional Gaussian and the average transmission cost is satisfied,implying that

{P _(B) _(i) _(|B) _(i−1) (b _(i) |b ^(i−1))≡P _(B) _(i) _(|B) _(i−1)^(g)(b _(i) |b ^(i−1)):i=0,1, . . . ,n}

is also conditionally Gaussian.

Similar to the procedure described in section B.6.1.2.1, a measurablefunction

$\left. {e_{i}\text{:}\mspace{14mu} ^{i - 1} \times _{i}}\rightarrow _{i} \right.,{_{i}\overset{\Delta}{=}{\mathbb{R}}^{p}},{a_{i} = {e_{i}\left( {b^{i - 1},u_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n$

exists such that

{U _(i) :e _(i)(b ^(i−1) ,U _(i))εda _(i) }=P _(A) _(i) _(|B) _(i−1)^(g)(da _(i) |b ^(i−1)), i=0,1, . . . ,n.

Because the channel output is defined by the example linear channelmodel, B_(i),

${B_{i}^{g} = {{- {\sum\limits_{j = 0}^{i - 1}{C_{i,j}B_{j}^{g}}}} + {D_{i,i}{e_{i}\left( {B^{g,{i - 1}},U_{i}} \right)}} + V_{i}}},{i = 1},\ldots \mspace{11mu},n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}{{tr}\left( {{Cov}\left( A_{i} \right)} \right)}}} \leq {\kappa.}}$

Moreover, the corresponding channel input process denoted by

A ^(g,n)

{A _(i) ^(g) :i=0,1, . . . ,n}

is Gaussian distributed, satisfying the average transmission costconstraint. Also, {U_(i): i=0, 1, . . . n} is Gaussian independent ofB^(g,i−1) for any i=0, 1, . . . , n, and

${A_{i}^{g} = {{g_{i}\left( {B^{i - 1},U_{i}} \right)} = {{\sum\limits_{j = 0}^{i - 1}{\Gamma_{i,j}B_{j}^{g}}} + U_{i}}}},{i = 0},1,\ldots \mspace{11mu},{{n.B_{i}^{g}} = {{\sum\limits_{j = 0}^{i - 1}{\left( {{D_{i,i}\Gamma_{i,j}} - C_{i,j}} \right)B_{j}^{g}}} + {D_{i,i}U_{i}} + V_{i}}},{i = 1},\ldots \mspace{11mu},{n.}$

In terms of

Γ_(i)

[Γ_(i,0)Γ_(i,1) . . . Γ_(i,i−1) ]·K _(B) _(g,i−1)

E{B ^(g,i−1)(B ^(g,i−1))T}, i=0,1, . . . ,n.

the average transmission cost may be:

${\sum\limits_{i = 0}^{n}{E{A_{i}^{g}}_{{\mathbb{R}}^{k}}^{2}}} = {\sum\limits_{i = 0}^{n}{\left( {{\Gamma_{i}K_{B^{g,{i - 1}}}\Gamma_{i}^{T}} + K_{U_{i}}} \right).}}$

Thus, the finite block length feedback capacity formula, in this examplecase, is characterized by:

${C_{A^{n}->B^{n}}^{{FB},{{AWGNM} - {A{.1}}}}(\kappa)} = {\max\limits_{\{{{\{{\Gamma_{i,j},K_{U_{i}}}\}}_{{i = 0},{j = 0}}^{n,{n - 1}};{{\frac{1}{n + 1}{\Sigma_{i = 0}^{n}{({{\Gamma_{i}K_{B^{g,{i - 1}}}\Gamma_{i}^{T}} + K_{U_{i}}})}}} \leq \kappa}}\}}{\frac{p}{2}{\sum\limits_{i = 0}^{n}{\log {\frac{{{D_{i,i}K_{U_{i}}D_{i,i}^{T}} + K_{V_{i}}}}{K_{V_{i}}}.}}}}}$

The covariance matrices {K_(B) ^(g,i−1): i=0, 1, . . . n} may be foundfrom B_(i) ^(g).

If a process {X_(i): i=0, 1, . . . , n} of a source, such as the source102, intended for transmission over this channel is R^(p)-valued,Gaussian distributed, and Markov, i.e.,

P _(X) _(i) _(|X) _(i−1) (dx _(i) |x ^(i−1))=P _(X) _(i) _(|X) _(i−1)(dx _(i) |x _(i−1)), i=0,1, . . . ,n},

and the matrices

{Γ_(i,j) *,K _(U) _(i) *:i=0,1, . . . ,n, j=0,1, . . . ,n−1}

are the matrices maximizing the above expression, then the coding schemewhich achieves the finite block length feedback capacity, in this case,is:

${A_{i}^{g,*} = {{{\sum\limits_{j = 0}^{i - 1}{\Gamma_{i,j}^{*}B_{j}^{g}}} + {\Delta_{i}^{*}\left\{ {X_{i} - {E\left\{ X_{i} \middle| B^{g,{i - 1}} \right\}}} \right\} \mspace{14mu} i}} = 0}},1,\ldots \mspace{11mu},n,{\Delta_{i}^{*} = {K_{U_{i}}^{*{,\frac{1}{2}}}\left\{ {{Cov}\left( {X_{i} - {E\left\{ X_{i} \middle| B^{g,{i - 1}} \right\}}} \right)} \right\}^{- \frac{1}{2}}}},{i = 0},1,\ldots \mspace{11mu},{n.}$

B.6.2. Class A.2.

For an example channel condition distribution,

{P _(B) _(i) _(|B) _(i−1) _(,A) _(i−L) _(i) :i=0,1, . . . ,n}

the optimal channel input distribution for the finite block lengthfeedback capacity, C_(A) _(n) _(;B) _(n) ^(noFB), is included in thesubset of channel input distributions:

_([0,n]) ^(A.2)

{P _(A) _(i) _(|A) _(i−L) _(i−1) _(,B) _(i−1) (da _(i) |a _(i−L) ^(i−1),b ^(i−1)):i=0, . . . ,n}⊂

_([0,n])

where

P _(A) _(i) _(|A) _(i−L) _(i−1) _(,B) _(i−1) (da _(i) |a _(i−L) ^(i−1),b ^(i−1)) for i=0,1, . . . ,L

may be determined from the convention used in the channel model. Forexample:

P _(A) _(i) _(|A) _(i−L) _(i−1) _(,B) _(i−1) (da _(i) |a _(i−L) ^(i−1),b ^(i−1))=P _(A) _(i) _(|B) _(i−1) (da _(i) |b ^(i−1)), i=0,1, . . .,L.

The characterization of the finite block length feedback capacity, inthis case, is:

${C_{A^{n}->B^{n}}^{{FB},{A{.2}}}\mspace{11mu} \mspace{11mu} \mspace{11mu} {\sum\limits_{i = 0}^{n}{\int{{\log \left( {\frac{{P_{{B_{i}|B^{i - 1}},A_{i - L}^{i}}\left( {{\cdot \left| b^{i - 1} \right.},a_{i - L}^{i}} \right)}}{{P_{B_{i}|B^{i - 1}}\left( {\cdot \left| b^{i - 1} \right.} \right)}}\left( b_{i} \right)} \right)}{P_{B^{i},A_{i - L}^{i}}\left( {{b^{i}},{a_{i - L}^{i}}} \right)}}}}} = {\mspace{11mu} {\sum\limits_{i = 0}^{n}{{I\left( {A_{i - L}^{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}.}}}$

If a transmission cost is imposed corresponding to any

γ_(i) ^(A.2)(a _(i−N) ^(i) ,b ^(i−1)), γ_(i) ^(C.2)(a _(i−N) ^(i) ,b_(i−K) ^(i−1)), i=0,1, . . . ,n.

then the characterization of the finite block length feedback capacitywith transmission cost, in this example case, is:

C A n -> B n FB , A  .2 , L   Λ   N  ( κ ) =  ∑ i = 0 n  I  ( Ai - L   Λ   N i ; B i | B i - 1 ) ,  [ 0 , n ] A  .2 , L   Λ  N =  [ 0 , n ] A  .2  | L = L   Λ   N , L   Λ   N     max { L , N }

B.6.2.1. Example Non-Linear and Linear Channel Models

Similar to the models discussed in section B.6.1.1, channel distributionof example class A.2 (as defined in section B.1. entitled“Characterizing Channels”) may include one or both of distributionsdefined on finite and countable alphabet space and distributions definedon continuous alphabet spaces, which distributions may be induced by themodels described below.

Let

${_{i}\overset{\Delta}{=}{\mathbb{R}}^{q}},{_{i}\overset{\Delta}{=}{\mathbb{R}}^{p}},{_{i}\overset{\Delta}{=}{\mathbb{R}}^{r}},\left. {h_{i}\text{:~~}^{i - 1} \times _{i - L}^{i} \times _{i}}\mapsto _{i} \right.,{i = 0},1,\ldots \mspace{11mu},n$

where {h_(i)(•,•,•): i=0, 1, . . . } are measurable functions, and{V_(i): i=0, 1, . . . } is a random channel noise process with jointdistribution:

P _(V) _(n) (dv ^(n)) on

(

^(n)).

Recursive expressions may define a nonlinear channel model for theexample channel class A.2 as follows:

B _(i) =h _(i)(B ^(i−1) ,A _(i−L) ^(i) ,V _(i)), i=1, . . . ,n

B ₀ =h ₀(B ^(i−1) ,A ⁰ _(−L) ,V ₀),

where transmission in the example model begins at time i=0, and theinitial data of a pre-determined convention. If the channel noiseprocess {V_(i): i=0, 1, . . . } is an independent sequence (e.g.,non-necessarily stationary), then the above nonlinear channel model maybe applied in step one of the two-step process with the induced channelprobability distribution:

ℙ{B_(i) ∈ Γ|B^(i − 1) = b^(i − 1), A^(i) = a^(i)} = ℙ{V_(i) : h_(i)(b^(i − 1), a_(i − L)^(i), V_(i)) ∈ Γ}, Γ ∈ (_(i)) = Q_(i)(Γ|b^(i − 1), a_(i − L)^(i)), i = 0, 1, …  , n.

B.6.2.2. Example MIMO AGN Channel with Memory

For a linear version of the example channel class A.2, the channeloutput may be modeled as:

${B_{i} = {{- {\sum\limits_{j = 0}^{i - 1}{C_{i,j}B_{j}}}} + {D_{i,i}A_{i}} + {D_{i,{i - 1}}A_{i - 1}} + V_{i}}},{i = 0},\ldots \mspace{11mu},n$

The channel noise process, in this example case, is taken to be Gaussiandistributed, i.e.,

P _(V) _(n) (dv ^(n))=Π_(i=0) ^(n) P _(V) _(i) (dv _(i)),V _(i) ˜N(0,K_(V) _(i) ), i=0,1, . . . ,n,

and the average transmission cost is taken to be:

${\frac{1}{n + 1}E\left\{ {A_{i}}_{{\mathbb{R}}^{q}}^{2} \right\}} \leq \kappa$

This is a generalization of the example channel model discussed insection B.6.1.2.2 entitled “Example AGN Channel with Memory,” and, thus,analogous results (e.g., finite block length capacities) are generatedby utilizing similar procedures.

Specifically, the channel input distribution, in this example case, maybe given by:

{B _(i) ≦b _(i) |B ^(i−1) =b ^(i−1) ,A ^(i) =a ^(i) }=

{V _(i) ≦b _(i)+Σ_(i=0) ^(i−1) C _(i,j) b _(j) −D _(i,j) a _(i) −D_(i,1−1) a _(i−1) }, i=0,1, . . . ,n.

The characterization of the finite block length feedback capacity, inthis example, (i.e., a formula for the finite block length feedbackcapacity) is given by:

${C_{A^{n}->B^{n}}^{{FB},{{AWGNM} - {A{.2}{.1}}}}(\kappa)} = {{\sup\limits_{{\{{{{{\pi_{i}{({{{da}_{i}|a_{i - 1}},b^{i - 1}})}}:i} = 0},\ldots \;,n}\}} \in}\left\{ {\sum\limits_{i = 0}^{n}{H\left( B_{i} \middle| B^{i - 1} \right)}} \right\}} - {H\left( V^{n} \right)}}$${with},{{{\mathbb{P}}\left\{ {\left. {B_{i} \leq b_{i}} \middle| B^{i - 1} \right. = b^{i - 1}} \right\}} = {\int_{_{i}}{{\mathbb{P}}\left\{ {V_{i} \leq {b_{i} + {\sum\limits_{j = 0}^{i - 1}{C_{i,j}b_{j}}} - {D_{i,i}a_{i}} - {D_{i,{i - 1}}a_{i - 1}}}} \right\} {\pi_{i}\left( {\left. {a_{i}} \middle| a_{i - 1} \right.,b^{i - 1}} \right)}{P_{A_{i - 1}|B^{i - 1}}\left( {a_{i - 1}} \middle| b^{i - 1} \right)}}}},{i = 0},1,\ldots \mspace{11mu},{n.}$

The optimal (e.g., capacity achieving) channel input distribution

{π_(i)(da _(i) |a _(i−1) ,b ^(i−1))≡P _(A) _(i) _(|B) _(i−1) ^(g)(a _(i)|b ^(i−1)):i=0,1, . . . ,n}

is conditional Gaussian, in this example, and the average transmissioncost is satisfied. Thus,

{P _(B) _(i) _(|B) _(i−1) (b _(i) |b ^(i−1))≡P _(B) _(i) _(|B) _(i−1)^(g)(b _(i) |b ^(i−1)):i=0,1, . . . ,n}

is also conditionally Gaussian.

The information structure of the channel input distribution

{π_(i)(da _(i) |a _(i−1) ,b ^(i−1))≡P _(A) _(i) _(|A) _(i−1) _(,B)_(i−1) ^(g)(a _(i) |a _(i−1) ,b ^(i−1)):i=0,1, . . . }

implies the following parametrization of the channel and channel inputdistribution:

$\mspace{20mu} {{A_{i}^{g} = {{e_{i}\left( {A_{i - 1}^{g},B^{g,{i - 1}},U_{i}} \right)} = {{\sum\limits_{j = 0}^{i - 1}{\Gamma_{i,j}B_{j}^{g}}} + {\Lambda_{i,{i - 1}}A_{i - 1}^{g}} + U_{i}}}},\mspace{20mu} {i = 0},1,\ldots \mspace{11mu},n,{B_{i}^{g} = {{\sum\limits_{j = 0}^{i - 1}{\left( {{D_{i,i}\Gamma_{i,j}} - C_{i,j}} \right)B_{j}^{g}}} + {\left( {{D_{i,i}\Lambda_{i,{i - 1}}} + D_{i,{i - 1}}} \right)A_{i - 1}^{g}} + {D_{i,i}U_{i}} + V_{i}}},{i = 1},\ldots \mspace{11mu},{n.}}$

Thus, the finite block length feedback capacity may be characterized byusing any state space representation of the channel output process.Note, if stationarity is assumed, the above equations are furthersimplified.

If a process {X_(i): i=0, 1, . . . , n} of a source, such as the source102, intended for transmission over this channel is R^(p)-valued,Gaussian distributed, and Markov, i.e.,

P _(X) _(i) _(|X) _(i−1) (dx _(i) |x ^(i−1))=P _(X) _(i) _(|X) _(i−1)(dx _(i) |x _(i−1)), i=0,1, . . . ,n},

and the matrices which maximize the parametrization of the channel inputdistribution are denoted by:

{Γ_(i,j)*,λ_(i,i−1) *,K _(U) _(i) *:i=0,1, . . . ,n, j=0,1, . . . ,n−1},

then the coding scheme which achieves the finite block length feedbackcapacity, in this case, is:

$A_{i}^{g,*} = {{\sum\limits_{j = 0}^{i - 1}{\Gamma_{i,j}^{*}B_{j}^{g}}} + {\Lambda_{i,{i - 1}}^{*}A_{i - 1}^{g,*}} + {{\Delta^{*}(i)}\left\{ \; {X_{i} - {E\left\{ {\left. X_{i} \middle| A^{g,{i - 1}} \right.,B^{g,{i - 1}}} \right\}}} \right\}}}$${i = 0},1,\ldots \mspace{11mu},n,{{\Delta^{*}(i)} = {K_{U_{i}}^{*{,\frac{1}{2}}}\left\{ {{Cov}\left( {X_{i} - {E\left\{ {\left. X_{i} \middle| A^{g,{i - 1}} \right.,B^{g,{i - 1}}} \right\}}} \right)} \right\}^{- \frac{1}{2}}}},{i = 0},1,\ldots \mspace{11mu},{n.}$

Although the above example illustrates a characterization of the finiteblock length capacity, channel input distributions, and capacityachieving coding schemes for a certain example channel of class A.2, theabove discussed procedure may be extended to any continuous alphabetchannel of class A.2, which channel is not necessarily driven byGaussian noise processes.

B.7. Example Characterizations of Capacities and Identification ofChannel Input Distributions for Channels in Class B

By way of example, example finite block length feedback capacityformulas and input distributions, for other example classes of channels,determined according the two-step procedure (described with reference toFIGS. 2, 3, 4, and 5) are presented below. The corresponding feedbackcapacities with and without transmission cost are limiting versions ofthe finite block length feedback capacities presented below.

B.7.1. Class B.1

For an example channel condition distribution,

{P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b _(i−1) ,a _(i)):i=0,1,. . . ,n}

referred to as “Unit Memory Channel Output,” the optimal channel inputdistribution C_(A) _(n) _(→B) _(n) ^(FB) is included in the subset:

_([0,n]) ^(B.1)

{P _(A) _(i) _(|B) _(i−1) (da _(i) |b _(i−1)):i=0,1, . . . ,n}⊂

_([0,n]) ^(A.1)

This subset implies that the corresponding joint process {(A_(i),B_(i)):i=0, . . . , n} and channel output process {B_(i): i=0, . . . ,n} arefirst-order Markov, i.e.,

P _(A) _(i) _(,B) _(i) _(|A) _(i−1) _(,B) _(i−1) (da _(i) ,db _(i) |b^(i−1) ,a ^(i−1))=P _(A) _(i) _(,B) _(i) _(|A) _(i−1) _(,B) _(i−1) (da_(i) ,db _(i) |a _(i−1) ,b _(i−1)), i=0, . . . ,n,

P _(B) _(i) _(|B) _(i−1) (db _(i) |b _(i−1))=P _(B) _(i) _(|B) _(i−1)(db _(i) |b _(i−1)), i=0,1, . . . ,n.

These findings are applicable to any channel input and output alphabetsas those described earlier, including countable and continuous alphabetspaces.

The characterization of the finite block length feedback capacity is:

$C_{A^{n}->B^{n\;}}^{{FB},{B{.1}}}\; \; {\underset{i = 0}{\overset{n}{\;\sum}}\; {{I\left( {A_{i};\left. B_{i} \middle| B_{i - 1} \right.} \right)}.}}$

This characterization, or capacity formula, is generated by: (i)applying step one of the two-step procedure to determine a candidate setof optimal channel input distributions

_([0,n]) ^(A.1) (e.g., because the channel is a special case of channeldistributions of class A.1); and (ii) applying step two of the two-stepprocedure to determine that the optimal channel input distribution isincluded in the narrower (e.g., including fewer elements) set

_([0,n]) ^(B.1).

If a transmission cost is imposed corresponding to

γ_(i) ^(B.1)(a _(i) ,b _(i−1))

then the example characterization of the finite block length feedbackcapacity with transmission cost is:

${C_{A^{n}->B^{n\;}}^{{FB},{B{.1}}}(\kappa)} = {\; {\underset{i = 0}{\overset{n}{\mspace{11mu}\sum}}\; {{I\left( {A_{i};\left. B_{i} \middle| B_{i - 1} \right.} \right)}.}}}$

B.7.1.1. Example Non-Linear and Linear Channel Models

Similar to the models discussed in sections B.6.1.1 and B.6.2.1, channeldistributions of example class B.1 may include one or both ofdistributions defined on finite and countable alphabet space anddistributions defined on continuous alphabet spaces, which distributionsmay be induced by the models described below.

A nonlinear model of a channel in the class B.1 with continuous alphabetspaces may include a recursive expression:

${B_{i} = {h_{i}\left( {B_{i - 1},A_{i},V_{i}} \right)}},{i = 0},\ldots \mspace{11mu},n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}{E\left\{ {\gamma^{B{.1}}\left( {A_{i},B_{i - 1}} \right)} \right\}}}} \leq \kappa}$

The characterization of the finite block length feedback capacity of thechannel defined by this model is given by:

${C_{A^{n}->B^{n}}^{{FB},{{NCM} - {B{.1}}}}(\kappa)}\mspace{11mu} \mspace{11mu} \sup\limits_{{\{{{{{P_{A_{i}|B_{i - 1}}{({{da}_{i}|b_{i - 1}})}}:i} = 0},\ldots \;,n}\}} \in}\mspace{11mu} {\sum\limits_{i = 0}^{n}{E\left\{ {\log \left( \frac{P_{{B_{i}|B_{i - 1}},A_{i}}\left( {\left. {B_{i}} \middle| B_{i - 1} \right.,A_{i}} \right)}{P_{B_{i}|B_{i - 1}}\left( {B_{i}} \middle| B_{i - 1} \right)} \right)} \right\}}}$  whereP_(B_(i)|B_(i − 1))(b_(i)|b_(i − 1)) = ∫_(_(i))P_(B_(i)|B_(i − 1), A_(i))(b_(i)|b_(i − 1), a_(i))P_(A_(i)|B_(i − 1))(a_(i)|b_(i − 1)),   i = 0, 1, …  , n  and [ 0 , n ] B  .1  ( κ )     { P A i | B i - 1  (  a i | bi - 1 ) , i = 0 , …  , n : 1 n + 1  ∑ i = 0 n  E  { γ i B  .1  ( Ai , B i - 1 ) } ≤ κ }

A computing device and/or operator of a communication system may performthe optimization or maximization of C_(A) _(n) _(→B) _(n) ^(FB,NCM−B.1)(

) using dynamic programming To illustrate this point, let

C _(t):

_(t−1)

denote the “cost-to-go” (corresponding to C_(A) _(n) _(→B) _(n)^(FB,NCM−B.1)(

)) from the time “t” to the terminal time “n” given the value of theoutput B_(t−1)=b_(t−1). This cost-to-go satisfies the following dynamicprogramming recursions:

${{C_{t}\left( b_{t - 1} \right)} = {\inf\limits_{s \leq 0}\sup\limits_{P_{A_{t}|B_{t}}{({{da}_{t}|b_{t - 1}})}}\left\{ {{\mspace{11mu} {\log \left( \frac{P_{{B_{t}|B_{t - 1}},A_{t}}\left( {\left. {b_{t}} \middle| b_{t - 1} \right.,a_{t}} \right)}{P_{B_{t}|B_{t - 1}}\left( {b_{t}} \middle| b_{t - 1} \right)} \right)}{{P_{{B_{t}|B_{t - 1}},A_{t}}\left( {\left. {b_{t}} \middle| b_{t - 1} \right.,a_{t}} \right)} \otimes {P_{A_{t}|B_{t - 1}}\left( {a_{t}} \middle| b_{t - 1} \right)}}} + {s\left\lbrack {{\int_{_{t}}{{\gamma_{i}^{B{.1}}\left( {a_{t},b_{t - 1}} \right)}{P_{A_{t}|B_{t - 1}}\left( {a_{t}} \middle| b_{t - 1} \right)}}} - {\left( {n + 1} \right)\kappa}} \right\rbrack} + {\mspace{11mu} {C_{t - 1}\left( b_{t} \right)}{P_{{B_{t}|B_{t - 1}},A_{t}}\left( {\left. {b_{t}} \middle| b_{t - 1} \right.,a_{t}} \right)}{{P}_{A_{t}|B_{t - 1}}\left( {a_{t}} \middle| b_{t - 1} \right)}}} \right\}}},{{C_{n}\left( b_{n - 1} \right)} = {\inf\limits_{s \leq 0}\sup\limits_{P_{A_{n}|B_{n - 1}}{({{da}_{n}|b_{n - 1}})}}{\left\{ {{\mspace{11mu} {\log \left( \frac{P_{{B_{n}|B_{n - 1}},A_{n}}\left( {\left. {b_{n}} \middle| b_{n - 1} \right.,a_{n}} \right)}{P_{B_{n}|B_{n - 1}}\left( {b_{n}} \middle| b_{n - 1} \right)} \right)}{P_{{B_{n}|B_{n - 1}},A_{n}}\left( {\left. {b_{n}} \middle| b_{n - 1} \right.,a_{n}} \right)}{{P}_{A_{n}|B_{n - 1}}\left( {a_{n}} \middle| b_{n - 1} \right)}} + {s\left\lbrack {{\int_{_{n}}{{\gamma_{n}^{B{.1}}\left( a_{n} \middle| b_{n - 1} \right)}{P_{A_{n}|B_{n - 1}}\left( {a_{n}} \middle| b_{n - 1} \right)}}} - {\left( {n + 1} \right)\kappa}} \right\rbrack}} \right\}.}}}$

The characterization of the finite block length feedback capacity (orthe formula for the finite block length feedback capacity) is thenexpressible as:

C _(A) _(n) _(→B) _(n) ^(FB,NCM−B.1)(

)=

C ₀(b ⁻¹)P _(B) ⁻¹ (db ⁻¹).

Note, although not discussed in detail here, the above dynamicprogrammic recursions also apply to channels defined on finite alphabetspaces.

In some implementations, once the optimal channel input distribution isfound and the finite block length feedback capacity is found, acomputing device may utilize the Blahut-Arimoto algorithm to compute themaximization of the dynamic programming, working backward in time (i.e.,sequentially). This utilization may reducer the computational complexityin solving the finite block length capacity, the capacity and thecorresponding capacity achieving channel input distribution.

To develop another characterization of the finite block length feedbackcapacity consider the information structure of the channel inputdistribution:

{P _(A) _(i) _(|B) _(i−1) (a _(i) |b _(i−1)):i=0,1, . . . }

This information structure implies that there exists a measurablefunction:

$\left. {e_{i}\text{:}\mspace{14mu} _{i - 1} \times _{i}}\rightarrow _{i} \right.,{_{i}\overset{\Delta}{=}{\mathbb{R}}^{p}},{a_{i} = {e_{i}\left( {b_{{i - 1},}u_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n,$

where {U_(i):i=0, 1, . . . , n} is an r-dimensional random process withdistribution:

{P _(U) _(i) (du _(i)):i=0,1, . . . ,n}

such that

{U _(i) :e _(i)(b _(i−1) ,U _(i))εda _(i) }=P _(A) _(i) _(|B) _(i−1) (da_(i) |b _(i−1)), i=0,1, . . . ,n.

Because the channel output is defined by the model, B

${A_{i} = {e_{i}\left( {B_{i - 1},U_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n,{B_{i} = {h_{i}\left( {B_{i - 1},{e_{i}\left( {B_{i - 1},U_{i}} \right)},V_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {E{{e_{i}\left( {B_{i - 1},U_{i}} \right)}}_{{\mathbb{R}}^{q}}^{2}}}} \leq {\kappa.}}$

Further, a class of admissible functions is:

_([0,n]) ^(NCM−B.1−IL)(

)

{e _(i)(b _(i−1) ,u _(i)), i=0, . . . ,n: for a fixed b _(i−1) thefunction e _(i)(b _(i−1),•)

is one-to-one and onto A_(i) for i=0, . . . , n,

$\left. {{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {E{{e_{i}\left( {B_{i - 1},U_{i}} \right)}}_{{\mathbb{R}}^{q}}^{2}}}} \leq \kappa} \right\}.$

The alternative example characterization of the finite block lengthfeedback capacity is:

C A n -> B n FB , NCM - B  .1  ( κ ) = max { P U i } i = 0 n , { e i ( · , · ) } i = 0 n ∈ [ 0 , n ] NCM - B  .1 - IL  ∑ i = 0 n   E  {log  ( P B i | B i - 1 , A i  (  B i | B i - 1 , e i  ( B i - 1 , Ui ) ) P B i | B i - 1  (  B i | B i - 1 ) ) } ≡ max { P U i } i = 0 n, { e i  ( · , · ) } i = 0 n ∈ [ 0 , n ] NCM - B  .1 - IL  ∑ i = 0 n I  ( U i ; B i | B i - 1 )   withP_(B_(i)|B_(i − 1))(b_(i)|b_(i − 1)) = ∫_(_(i))P_(B_(i)|B_(i − 1), A_(i))(B_(i)|B_(i − 1), e_(i)(b_(i − 1), u_(i))) ⊗ P_(U_(i)|B_(i − 1))(u_(i)|b_(i − 1)),   i = 0, 1, …  , n.

A computing device or operator of a communication system may solve thisexample maximization of C_(A) _(n) _(;B) _(n) ^(FB,NCM−B.1) (

) via dynamic programming or the stochastic calculus of variations, forexample. Because the optimal channel input distribution is generated bythe above described procedure, computations using the dynamicprogramming equation, or the Blahut-Arimoto Algorithm appliedsequentially in time to the dynamic programming, are simplified.Moreover, for stationary versions (e.g., not varying in time), thealgorithms further simplify.

A linear model of a channel in the class B.1 may be expressed via arecursive expression:

${B_{i} = {{{- C_{i,{i - 1}}}B_{i - 1}} + {D_{i,i}A_{i}} + V_{i}}},{i = 0},1,\ldots,n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}{E\left\{ {\gamma^{B{.1}}\left( {A_{i},B_{i - 1}} \right)} \right\}}}} \leq \kappa}$

where {V_(i):i=0, 1, . . . } is independently distributed according to

P_(V) _(n) (dv^(n))=Π_(i=0) ^(n)P_(V) _(i) (dv_(i)),

with zero mean and covariance matrix:

μ_(V) _(i)

E(V _(i))=0.K _(V) _(i)

E(V _(i) V _(i) ^(T)), i=0,1, . . . ,n

For each i=0, 1, . . . , n, the coefficients {C_(i,j),D_(i,i): i=0, . .. , n, j=0, 1, . . . , i−1} are real-valued matrices with dimensions pby p and p by q, respectively (e.g., {p, q} being positive integers).The channel distribution is given by:

$\mspace{20mu} {{{{\mathbb{P}}\left\{ {{\left. {B_{i} \leq b_{i}} \middle| B^{i - 1} \right. = b^{i - 1}},{A^{i} = a^{i}}} \right\}} = {{\mathbb{P}}\left\{ {V_{i} \leq {b_{i} + {C_{i,{i - 1}}b_{i}} - {D_{i,i}a_{i}}}} \right\}}},\mspace{20mu} {i = 0},1,\ldots,{{n.{I\left( {A^{n}->B^{n}} \right)}} = {{\sum\limits_{i = 0}^{n}\left\{ {{H\left( B_{i} \middle| B^{i - 1} \right)} - {H\left( {\left. B_{i} \middle| B^{i - 1} \right.,A_{i}} \right)}} \right\}} = {{\sum\limits_{i = 0}^{n}{H\left( B_{i} \middle| B_{i - 1} \right)}} - {H\left( V^{n} \right)}}}}}$

and an example characterization of the finite block length feedbackcapacity is given by:

C A n -> B n FB , LCM - B  .1  ( κ ) = sup { P A i | B i - 1  ( da i| b i - 1 ) , i = 0   …   n : } ∈ [ 0 , n ] LCM - B  .1  ( κ )  {∑ i = 0 n  H  ( B i | B i - 1 ) } - H  ( V n )   where [ 0 , n ]LCM - B  .1  ( κ )     { P A i | B i - 1  (  a i | b i - 1 ) , i= 0 , …  , n : 1 n + 1  ∑ i = 0 n  E  { γ i B  .1  ( A i , B i - 1) } ≤ κ }   andℙ{B_(i) ≤ b_(i)|B_(i − 1) = b^(i − 1)} = ∫_(_(i))ℙ{V_(i) ≤ b_(i) + C_(i, i − 1)b_(i) − D_(i, i)a_(i)}P_(A_(i)|B_(i − 1))(a_(i)|b_(i − 1)),  i = 0, 1, …, n.

The “cost-to-go” satisfies the following example dynamic programmingrecursions:

${{C_{t}\left( b_{t - 1} \right)} = {\inf\limits_{s \leq 0}\sup\limits_{P_{A_{t}|B_{t - 1}}{({{da}_{t}|b_{t - 1}})}}\left\{ {{- {\int_{_{t} \times _{t}}{{\log \left( \frac{P_{B_{t}|B_{t - 1}}\left( {b_{t}} \middle| b_{t - 1} \right)}{b_{t}} \right)}{{P_{{B_{t}|B_{t - 1}},A_{t}}\left( {\left. {b_{t}} \middle| b_{t - 1} \right.,a_{t}} \right)} \otimes {P_{A_{t}|B_{t - 1}}\left( {a_{t}} \middle| b^{t - 1} \right)}}}}} + {s\left\lbrack {{\int_{_{t}}{{\gamma_{i}^{B{.1}}\left( {a_{t},b_{t - 1}} \right)}{P_{A_{t}|B_{t - 1}}\left( {a_{t}} \middle| b_{t - 1} \right)}}} - {\left( {n + 1} \right)\kappa}} \right\rbrack} + {\; {C_{t + 1}\left( b_{t} \right)}{{P_{B_{t}|B_{t - 1}}\left( {\left. {b_{t}} \middle| b_{t - 1} \right.,a_{t}} \right)} \otimes {P_{A_{t}|B_{t - 1}}\left( {a_{t}} \middle| b_{t - 1} \right)}}}} \right\}}},{{C_{n}\left( b_{n - 1} \right)} = {\inf\limits_{s \leq 0}\sup\limits_{P_{A_{n}|B_{n - 1}}{({{da}_{n}|b_{n - 1}})}}{\left\{ {{- {\int_{_{n} \times _{n}}{{\log \left( \frac{P_{B_{n}|B_{n - 1}}\left( {b_{n}} \middle| b_{n - 1} \right)}{b_{n}} \right)}{P_{{B_{n}|B_{n - 1}},A_{n}}\left( {\left. {b_{n}} \middle| b_{n - 1} \right.,a_{n}} \right)}\; {P_{A_{n}|B_{n - 1}}\left( {a_{n}} \middle| b^{n - 1} \right)}}}} + {s\left\lbrack {{\int_{_{n}}{{\gamma_{n}^{B{.1}}\left( {a_{n},b_{n - 1}} \right)}{P_{A_{n}|B_{n - 1}}\left( {a_{n}} \middle| b_{n - 1} \right)}}} - {\left( {n + 1} \right)\kappa}} \right\rbrack}} \right\}.}}}$

Further, a computing device and/or operator may generate an alternativecharacterization of the finite block length feedback capacity based onthe information structure of the channel input distribution. Forexample, the channel input distribution:

{P _(A) _(i) _(|B) _(i−1) (a _(i) |b _(i−1)):i=0,1, . . . }

implies that there exists a measurable function:

$\left. {e_{i}\text{:}\mspace{11mu} _{i - 1} \times _{i}}\rightarrow _{i} \right.,{_{i}\overset{\Delta}{=}{\mathbb{R}}^{p}},{a_{i} = {e_{i}\left( {b_{{i - 1},}u_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n,$

where {U_(i):i=0, 1, . . . , n} is a p-dimensional random process withdistribution:

{P _(U) _(i) (du _(i)):i=0,1, . . . ,n}

such that

{U _(i) :e _(i)(b _(i−1) ,U _(i))εda _(i) }=P _(A) _(i) _(|B) _(i−1) (da_(i) |b _(i−1)), i=0,1, . . . ,n.

Based on the existence of this example measurable function:

${A_{i} = {e_{i}\left( {B_{i - 1},U_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n,{B_{i} = {{{- C_{i,{i - 1}}}B_{i - 1}} + {D_{i,i}{e_{i}\left( {B_{i - 1},U_{i}} \right)}} + V_{i}}},{i = 0},1,\ldots \mspace{11mu},n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {E{{e_{i}\left( {B_{i - 1},U_{i}} \right)}}_{{\mathbb{R}}^{q}}^{2}}}} \leq {\kappa.}}$

and an example set of admissible functions is:

_(0,n) ^(LCM−B.1−IL)(

)

{e _(i)(b ^(i−1) ,u _(i)), i=0, . . . ,n: for a fixed b _(i−1) thefunction e _(i)(b ^(i−1),•)

is one-to-one and onto A_(i) for i=0, . . . , n,

$\left. {{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {E{{e_{i}\left( {B_{i - 1},U_{i}} \right)}}_{{\mathbb{R}}^{q}}^{2}}}} \leq \kappa} \right\}$

The example alternative characterization of the finite block lengthfeedback capacity is

C A n -> B n FB , LCM - B  .1  ( κ ) =  max { P U i } i = 0 n , { e i ( · , · ) } i = 0 n ∈ 0 , n LCM - A  .1 - IL  ∑ i = 0 n  H e  ( Bi | B i - 1 ) - H  ( V n ) . ≡  max { P U i } i = 0 n , { e i  ( · ,· ) } i = 0 n ∈ 0 , n LCM - A  .1 - IL  ∑ i = 0 n  I  ( U i ; B i |B i - 1 )   withℙ{B_(i) ≤ b_(i)|B_(i − 1) = b_(i − 1)} = ∫_(_(i))ℙ{V_(i) ≤ b_(i) + C_(i, i − 1)b_(i) − D_(i, i)e_(i)(b_(i − 1), u_(i))}P_(U_(i)|B_(i − 1))(u_(i)|b_(i − 1)),   i = 0, 1, …  , n.

B.7.1.2. Example MIMO AGN Channel with Memory

The following illustrates characterizations of capacities and channelinput distributions for a special case of a channel in class B.1described by a linear channel model. In the special case, a channelnoise process is

V _(i) ˜N(0,K _(V) _(i) ), i=0,1, . . . ,n,

or approximately Gaussian.

By the entropy maximizing property of Gaussian distributions, the finiteblock length feedback capacity is bounded from above by the inequality

H(B ^(n))≦H(B ^(g,n)),

where

B ^(g,n)

{B _(i) ^(g) :i=0,1, . . . ,n}

is Gaussian distributed. This upper bound may be achieved when

{P _(A) _(i) _(|B) _(i−1) (da _(i) |b _(i−1))≡P _(A) _(i) _(|B) _(i−1)^(g)(a _(i) |b _(i−1)):i=0,1, . . . ,n}

is conditional Gaussian and the average transmission cost is satisfied,implying that

{P _(B) _(i) _(|B) _(i−1) (b _(i) |b _(i−1))≡P _(B) _(i) _(|B) _(i−1)^(g)(b _(i) |b _(i−1)):i=0,1, . . . ,n}

is also conditionally Gaussian.

Similar to the other procedures described above with reference to linearchannel models, a measurable function

$\left. {e_{i}\text{:~~}_{i - 1} \times _{i}}\rightarrow _{i} \right.,{_{i}\overset{\Delta}{=}{\mathbb{R}}^{p}},{a_{i} = {e_{i}\left( {b_{i - 1},u_{i}} \right)}},{i = 0},1,\ldots \mspace{11mu},n$

exists such that

{U _(i) :e _(i)(b _(i−1) ,U _(i))εda _(i) }=P _(A) _(i) _(|B) _(i−1) (da_(i) |b _(i−1)), i=0,1, . . . ,n.

Based on the existence of this measurable function the channel is givenby,

${B_{i}^{g} = {{{- C_{i,{i - 1}}}B_{i - 1}^{g}} + {D_{i,i}{e_{i}\left( {B_{i - 1}^{g},U_{i}} \right)}} + V_{i}}},{i = 1},\ldots \mspace{11mu},n,{{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {{tr}\left( {{Cov}\left( A_{i} \right)} \right)}}} \leq {\kappa.}}$

Because the channel output process is Gaussian distributed and a linearcombination of any sequence of random variables is Gaussian distributedif and only if the sequence of random variables is also jointly Gaussiandistributed, then the functions

{e _(i)(•,•):i=0,1, . . . ,n}

are necessarily linear and {U_(i):i=0, 1, . . . , n} is necessarily aGaussian sequence, in this example case. These properties imply that thecorresponding channel input process, denoted by

A ^(g,n)

{A _(i) ^(g) :i=0,1, . . . ,n}

is Gaussian distributed, satisfying the average tranmission costconstraint. Moreover, U_(i) is independent of B^(g,i−1) for any i=0, 1,. . . , n. Thus,

A _(i) ^(g) =e _(i)(B _(i−1) ^(g) ,U _(i))=Γ_(i,i−1) B _(i−1) ^(g) +U_(i) , i=0,1, . . . ,n.

B _(i) ^(g)=(D _(i,i)Γ_(i,i−1) −C _(i,i−1))B _(i−1) ^(g) +D _(i,i) U_(i) +V _(i) , i=0, . . . ,n.

Also, because the output process is conditionally Gaussian, in thisexample case, the conditional entropies

H(B _(i) ^(g) |B _(i−1) ^(g) =b ^(i−1))

are independent of b_(i−1) and

${\sum\limits_{i = 0}^{n}\; {H\left( B_{i}^{g} \middle| B_{i - 1}^{g} \right)}} = {{H\left( V^{n} \right)} = {\frac{p}{2}{\sum\limits_{i = 0}^{n}\; {\log {\frac{{{D_{i,i}K_{U_{i}}D_{i,i}^{T}} + K_{V_{i}}}}{K_{V_{i}}}.}}}}}$

Then, defining

Γ _(i,i−1) =D _(i,i)Γ_(i,i−1) −C _(i,i−1) , i=0,1, . . . ,n, Γ_(0,−1)=0.

A computing device or operator of a communication system may express:

A _(i) ^(g)=Γ_(i,i−1) B _(i−1) ^(g) +U _(i) , i=0,1, . . . ,n,

B _(i) ^(g)= Γ _(i,i−1) B _(i−1) ^(g) +D _(i,i) U _(i) +V _(i) , i=1,2,. . . ,n, B ₀ ^(g) =D _(0,0) U ₀ +V ₀,

K _(B) _(i−1) _(g)

E{B _(i−1) ^(g)(B _(i−1) ^(g))^(T) }, i=0,1, . . . ,n.

with the following recursion:

K _(B) _(i) _(g) = Γ _(i,i−1) K _(B) _(i−1) _(g) Γ _(i,i−1) ^(T) +D_(i,i) K _(U) _(i) D _(i,i) ^(T) +K _(V) _(i) , K _(B) ₀ _(g) =D _(0,0)K _(U) ₀ D _(0,0) ^(T) +V ₀ , i=1, . . . ,n

In this example case, the average transmission cost is:

${\sum\limits_{i = 0}^{n}\; {E{A_{i}^{g}}_{{\mathbb{R}}^{k}}^{2}}} = {\sum\limits_{i = 0}^{n}\; {{{tr}\left( {{\Gamma_{i,{i - 1}}K_{B_{i - 1}^{g}}\Gamma_{i,{i - 1}}^{T}} + K_{U_{i}}} \right)}.}}$

and the example finite block length feedback capacity is characterizedby

${{{C_{A^{n}\rightarrow B^{n}}^{{FB},{{AWGNW} - {B{.1}}}}(\kappa)} = \max\limits_{{\{{{{{\{{\Gamma_{i,{i - 1}}K_{U_{i}}})}_{{i = 0},{j = 0^{*}}}^{n,{n - 1}} \cdot \frac{1}{\{{n - 1}\}}}{\sum\limits_{i = 0}^{n}\; {{tr}{({{\Gamma_{i,{i - 1}}K_{B_{i - 1}^{g}}\Gamma_{t,{i - 1}}^{\Gamma}} + K_{U_{i}}})}}}} \leq k}\}}\quad}}\quad} {\quad{\quad{\frac{P}{2}{\sum\limits_{i = 0}^{n}\; {\log {\frac{{{D_{i,i}K_{U_{i}}D_{i,i}^{T}} + K_{V_{i}}}}{K_{V_{i}}}.}}}}}}$

If a process {X_(i):i=0, 1, . . . , n} of a source, such as the source102, intended for transmission over this channel is R^(p)-valued,Gaussian distributed, and Markov and the matrices which maximize thefinite block length feedback capacity are

{Γ_(i,i−1) *,K _(U) _(i) *:i=0,1, . . . ,n},

then the coding scheme which achieves the finite block length feedbackcapacity is:

${A_{i}^{{g.}*} = {{{\Gamma_{{i.i} - 1}^{*}B_{i - 1}^{g}} + {\Delta_{i,{i - 1}}^{*}\left\{ {X_{i} - {E\left\{ {X_{i}B^{g,{i - 1}}} \right\}}} \right\} \mspace{14mu} i}} = 0}},1,\ldots \mspace{14mu},n,{\Delta_{{i.i} - 1}^{*} = {K_{U_{i}}^{*{,\frac{1}{2}}}\left\{ {{Cov}\left( {X_{i} - {E\left\{ {X_{i}B^{g,{i - 1}}} \right\}}} \right)} \right\}^{- \frac{1}{2}}}},{i = 0},1,\ldots \mspace{14mu},{n.}$

B.7.1. Class B.2

For an example channel condition distribution,

{P _(B) _(i) _(|B) _(i−M) _(i−1) _(,A) _(i) (db _(i) b _(i−M) ^(i−1) ,a_(i)):i=0,1, . . . ,n},

where M is a finite nonnegative integer, the optimal channel inputdistribution for C_(A) _(n) _(→B) _(n) ^(FB) is included in the set

[ 0 , n ] B  .2  = Δ  { P A i | B i - M i - 1  ( da i | b i - M i -1 )  :   i = 0 , 1 , …  , n } ⋐ [ 0 , n ] A  .1

This fact implies that the corresponding joint process {(A_(i),B₁): i=0,. . . , n} and channel output process {B_(i):i=0, . . . , n} are M-orderMarkov processes.

An example characterization of the finite block length feedback capacityis

$\mspace{20mu} \begin{matrix}{C_{A^{n}\rightarrow B^{n}}^{{FB},{B{.2}}}\overset{\Delta}{=}{{\sum\limits_{i = 0}^{n}\; {\int{{\log \left( {\frac{{dP}_{{B_{i}|B_{i - M}^{i - 1}},A_{i}}\left( {{\cdot \left| b_{i - M}^{i - 1} \right.},a_{i}} \right)}{{dP}_{B_{i}|B_{i - M}^{i - 1}}\left( {\cdot \left| b_{i - M}^{i - 1} \right.} \right)}\left( b_{i} \right)} \right)}{P_{B_{i - M}^{i},A_{i}}\left( {{b_{i - M}^{i}},{a_{i}}} \right)}}}}}} \\{= {{\sum\limits_{i = 0}^{n}\; {I\left( {A_{i};\left. B_{i} \middle| B_{i - M}^{i - 1} \right.} \right)}}}}\end{matrix}$P_(B_(i)|B_(i − M)^(i − 1))(db_(i)|b_(i − M)^(i − 1)) =   ∫P_(B_(i)|B_(i − M)^(i − 1), A_(i))(b_(i)|b_(i − M)^(i − 1), a_(i))Ø P_(A_(i)|B_(i − M)^(i − 1))(a_(i)|b_(i − M)^(j − 1)), i = 0, 1, …  , n.

Also, if a transmission cost is imposed, then the examplecharacterization may be expressed as

${C_{A^{n}->B^{n}}^{{FB},{B{.2}},{M\bigwedge K}}(\kappa)}{\sum\limits_{i = 0}^{n}\; {\int_{\;}^{\;}{{\log \left( {\frac{{P_{{{Bi}B_{i - M}^{i - 1}},A_{i}}\left( {{\cdot {b_{i - M}^{i - 1}}},a_{i}} \right)}}{{P_{{Bi}B_{i - {M\bigwedge K}}^{i - 1}}\left( {\cdot {b_{i - {M\bigwedge K}}^{i - 1}}} \right)}}\ \left( b_{i} \right)} \right)}{P_{B_{M\bigwedge K}^{i},A_{i}}\left( {{b_{i - {M\bigwedge K}}^{i}},{a_{i}}} \right)}}}}$  where  ∘ [ 0 , n ] B  .2 , M ⋀ K   { P A i  B i - M ⋀ K i - 1 (  a i  b i - M ⋀ K i - 1 )  :   i = 0 , 1 , …  , n }P_(B_(i − M⋀K)^(i), A_(i))(b_(i − M⋀K)^(i), a_(i)) = P_(B_(i)B_(i − M)^(i − 1), A_(i))(b_(i)b_(i − M)^(i − 1), a_(i)) ⊗ P_(A_(i)B_(i − M⋀K)^(i − 1))(a_(i)b_(i − M⋀K)^(i − 1)) ⊗ P_(B_(i − M⋀K)^(i − 1))(b_(i − M⋀K)^(i − 1)),  i = 0, 1, …  , n, P_(B_(i)B_(i − M⋀K)^(i − 1))(b_(i)b_(i − M⋀K)^(i − 1)) = ∫P_(B_(i)B_(i − M)^(i − 1), A_(i))(b_(i)b_(i − M)^(i − 1), a_(i)) ⊗ P_(A_(i)B_(i − M⋀K)^(i − 1))  (a_(i)b_(i − M⋀K)^(i − 1)) , i = 0, 1, …  , n.

B.7.1. Class B.3

For an example channel condition distribution,

{P _(B) _(i) _(|B) _(i−M) _(i−1) _(,A) _(i) (db _(i) b _(i−M) ^(i−1) ,a_(i)):i=0,1, . . . ,n},

where M is a finite nonnegative integer, the optimal channel inputdistribution for C_(A) _(n) _(→B) _(n) ^(FB) is included in the set

${{\overset{{^\circ}}{}}_{\lbrack{0,n}\rbrack}^{B{.3}}\mspace{14mu} \mspace{14mu} \left\{ {{{{P_{{A_{i}|A^{i - 1}},B^{i - 1}}\left( {\left. {da}_{i} \middle| a^{i - 1} \right.,b^{i - 1}} \right)}\text{:}i} = 0},1,\ldots,n} \right\}} \Subset _{\lbrack{0,n}\rbrack}$

An example characterization of the finite block length feedback capacityis

$C_{A^{n}\rightarrow B^{n}}^{{FB},{B{.3}}}\mspace{14mu} \mspace{14mu} \mspace{14mu} {\sum\limits_{i = 0}^{n}\; {\int{{\log \left( {\frac{{dP}_{{B_{i}|B_{i - M}^{i - 1}},A^{i}}\left( {{\cdot \left| b_{i - M}^{i - 1} \right.},a^{i}} \right)}{{dP}_{B_{i}|{B^{i - 1}{({\cdot {|b^{i - 1}}})}}}}\left( b_{i} \right)} \right)}{{P_{B^{i},A^{i}}\left( {{db}_{i},{da}^{i}} \right)}.}}}}$

If a transmission cost is imposed corresponding to any instantaneoustransmission cost function of classes A, B, and C, then the examplecharacterization of the finite block length feedback capacity is givenby the above expression for

_([0,n]) ^(B.3) using

_([0,n]) ^(B.3)∩

_([0,n])(

).

B.8. Example Characterizations of Capacities and Identification ofChannel Input Distributions for Channels in Class C

By way of example, finite block length feedback capacity formulas andinput distributions, for still further classes of channels, determinedaccording the two-step procedure (described with reference to FIGS. 2,3, 4, and 5) are presented below. The corresponding feedback capacitieswith and without transmission cost are limiting versions of the finiteblock length feedback capacities presented below.

B.8.1. Class C.1

For an example channel with a channel conditional distribution:

{P _(B) _(i) _(|B) _(i−2) _(i−1) _(,A) _(i−1) _(i) (db _(i) b _(i−2)^(i−1) ,a _(i−1) ^(i)):i=0,1, . . . ,n},

the optimal channel input distribution for C_(A) _(n) _(→B) _(n) ^(FB)is included in the set

${\overset{{^\circ}}{}}_{\lbrack{0,n}\rbrack}^{C{.1}}\mspace{14mu} \mspace{14mu} \left\{ {{{{P_{{A_{i}|A_{i - 1}},B_{i - 2}^{i - 1}}\left( {\left. {da}_{i} \middle| a_{i - 1} \right.,b_{i - 2}^{i - 1}} \right)}\text{:}i} = 0},1,\ldots,n} \right\}$

This inclusion implies that the corresponding joint process{(A_(i),B_(i)):i=0, . . . , n} and channel output process {B_(i):i=0, .. . , n} are second-order Markov processes, i.e.,

$\begin{matrix}{{{P_{A_{i},{B_{i}|A^{i - 1}},B^{i - 1}}\left( {{da}_{i},\left. {db}_{i} \middle| b^{i - 1} \right.,a^{i - 1}} \right)} = {P_{A_{i},{B_{i}|A_{i - 2}^{i - 1}},B_{i - 2}^{i - 1}}\left( {{da}_{i},\left. {db}_{i} \middle| a_{i - 2}^{i - 1} \right.,b_{i - 2}^{i - 1}} \right)}},{i = 0},\ldots,n,} \\{{{P_{B_{i}|B^{i - 1}}\left( {db}_{i} \middle| b^{i - 1} \right)} = {P_{B_{i}|B_{i - 2}^{i - 1}}\left( {db}_{i} \middle| b_{i - 2}^{i - 1} \right)}},{i = 0},1,\ldots,{n.}}\end{matrix}$

An example characterization of the finite block length feedback capacityis

$C_{A^{n}\rightarrow B^{n}}^{{FB},{C{.1}}}\mspace{14mu} \mspace{14mu} \frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {{I\left( {A_{i - 1}^{i};\left. B_{i} \middle| B_{i - 2}^{i - 1} \right.} \right)}.}}$

If a tranmission cost is imposed, this example characterization may beexpressed as:

${C_{A^{n}\rightarrow B^{n}}^{{FB},{C{.1}{.2}\; \Delta \; K}}(\kappa)}\overset{\Delta}{=}{{\sum\limits_{i = 0}^{n}\; {\int{{\log \left( {\frac{{dP}_{{B_{i}|B_{i - 2}^{i - 1}},A_{i - 1}^{i}}\left( {{\cdot \left| b_{i - 2}^{i - 1} \right.},a_{i - 1}^{i}} \right)}{{dP}_{B_{i}|{B_{i - {2\;\bigwedge\; K}}^{i - 1}{({\cdot {|b_{i - {2\;\bigwedge\; K}}^{i - 1}}})}}}}\left( b_{i} \right)} \right)}{P_{B_{2\;\bigwedge\; K}^{i}A_{i - 1}^{i}}\left( {{db}_{i - {2\;\bigwedge\; K}}^{i},{da}_{i - 1}^{i}} \right)}}}}}$where${\overset{{^\circ}}{}}_{\lbrack{0,n}\rbrack}^{C{.1}{.2}\; \Delta \; K}\mspace{14mu} \mspace{14mu} \left\{ {{{{P_{{A_{i}|A_{i - 1}},B_{i - {2\bigwedge\; K}}^{i - 1}}\left( {\left. {da}_{i} \middle| a_{i - 1} \right.,b_{i - {2\;\bigwedge\; K}}^{i - 1}} \right)}\text{:}i} = 0},1,\ldots,n} \right\}$P_(B_(i − 2 ⋀ K)^(i), A_(i))(db_(i − 2 ⋀ K)^(i), da_(i), da_(i − 1)) = P_(B_(i)|B_(i − 2)^(i − 1), A_(i − 1)^(i))(db_(i)|b_(i − 2)^(i − 1), a_(i − 1)^(i)) ⊗ P_(A_(i)|A_(i − 1), B_(i − 2⋀ K)^(i − 1))(da_(i)|a_(i − 1), b_(i − 2⋀ K)^(i − 1)) ⊗ P_(A_(i − 1)|B_(i − 2 ⋀ K)^(i − 1))(a_(i − 1)|b_(i − 2 ⋀K)^(i − 1)) ⊗ P_(B_(i − 2⋀ K)^(i − 1))(db_(i − 2⋀ K)^(i − 1)), i = 0, 1, …, n, P_(B_(i)|B_(i − 2 ⋀ K)^(i − 1))(db_(i)|b_(i − 2 ⋀ K)^(i − 1)) = ∫P_(B_(i)|B_(i − 2)^(i − 1), A_(i − 1)^(i))(db_(i)|b_(i − 2)^(i − 1), a_(i − 1)^(i)) ⊗ P_(A_(i)|A_(i − 1), B_(i − 2 ⋀K)^(i − 1))(da_(i)|a_(i − 1), b_(i − 2 ⋀ K)^(i − 1)) ⊗ P_(A_(i − 1)|B_(i − 2 ⋀ K)^(i − 1))(a_(i − 1)|b_(i − 2 ⋀K)^(i − 1)), i = 0, 1, …, n.

B.8.2. Class C.2

For an example channel condition distribution,

{P _(B) _(i) _(|B) _(i−M) _(i−1) _(,A) _(i−1) _(i) :i=0,1, . . . ,n},

the optimal channel input distribution for C_(A) _(n) _(→B) _(n) ^(FB)is included in the set

[ 0 , n ] C  .2     { P A i | A i - L i - 1 , B i - M  ⋀  L i - 1 ( da i | a i - L i - 1 , b i - M ⋀  L i - 1 )  :  i = 0 , 1 , … , n} ⋐ [ 0 , n ] A  .2

This inclusion implies that the corresponding joint process{(A_(i),B_(i)):i=0, . . . , n} and channel output process {B_(i):i=0, .. . , n} are limited-memory Markov processes.

B.8.2.1. Unit Memory Channel Input Output (UMCIO) Example

For a special case of the example channel class C.2, an example channelconditional distribution is

{P _(B) _(i) _(|B) _(i−1) _(,A) _(i−1) _(,A) _(i) (db _(i) |b _(i−1) ,a_(i) ,a _(i−1)):i=0,1, . . . ,n}

and the optimal channel input distribution for C_(A) _(n) _(→B) _(n)^(FB) may be included in the set

[ 0 , n ] UMCIO     { P A i | A i - 1 , B i - 1  ( da i | a i - 1 ,b i - 1 )  :  i = 0 , 1 , … , n } ⋐ [ 0 , n ] C  .1

This inclusion implies that the corresponding joint process{(A_(i),B_(i)):i=0, . . . , n} and channel output process {B_(i):i=0, .. . , n} are first-order Markov processes.

An example characterization of the finite block length feedbackcapacity, in this example case is

${C_{A^{n}\rightarrow B^{n}}^{{FB},{UMCIO}}\mspace{14mu} \mspace{14mu} \sup\limits_{\{{{{{P_{A_{i}|A_{i - 1}}{({{{da}_{i}|a_{i - 1}},b_{i - 1}})}}\text{:}i} = 0},1,\ldots,n}\}}{\sum\limits_{i = 0}^{n}\; {I\left( {A_{i - 1},{A_{i};\left. B_{i} \middle| B_{i - 1} \right.}} \right)}}},{where}$${{I\left( {A_{i - 1},{A_{i};\left. B_{i} \middle| B_{i - 1} \right.}} \right)} = {\int{{\log \left( {\frac{{dP}_{{B_{i}|B_{i - 1}},A_{i},A_{i - 1}}\left( {{\cdot \left| b_{i - 1} \right.},a_{i},a_{i - 1}} \right)}{{dP}_{B_{i}|B_{i - 1}}\left( {\cdot \left| b_{i - 1} \right.} \right)}\left( b_{i} \right)} \right)}{{P_{{B_{i}|B_{i -}},A_{i},A_{i - 1}}\left( {\left. {db}_{i} \middle| b_{i - 1} \right.,a_{i},a_{i - 1}} \right)} \otimes {P_{{A_{i}|A_{i - 1}},B_{i - 1}}\left( {\left. {da}_{i} \middle| a_{i - 1} \right.,b_{i - 1}} \right)} \otimes {P_{A_{i - 1},B_{i - 1}}\left( {{da}_{i - 1},{db}_{i - 1}} \right)}}}}},{i = 0},1,\ldots$andP_(B_(i)|B_(i − 1))(db_(i)|b_(i − 1)) = ∫P_(B_(i)|B_(i − 1), A_(i − 1), A_(i))(db_(i)|b_(i − 1), a_(i − 1), a_(i)) ⊗ P_(A_(i)|A_(i − 1), B_(i − 1))(da_(i)|a_(i − 1), b_(i − 1)) ⊗ P_(A_(i − 1)|B_(i − 1))(a_(i − 1)|b_(i − 1)), i = 0, 1, …, n.

B.8.2.2. Unit Memory Channel Input (UMCI) Example

For another special case of the example channel class C.2, an examplechannel conditional distribution is

{P _(B) _(i) _(|A) _(i−1) _(,A) _(i) (db _(i) |a _(i−1) ,a _(i)):i=0,1,. . . ,n}

and the optimal channel input distribution for C_(A) _(n) _(→B) _(n)^(FB) may be included in the set

_([0,n]) ^(UMCI)

{P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) (da _(i) |a _(i−1) ,b_(i−1)):i=0,1, . . . ,n}.

This inclusion implies that the corresponding joint process{(A_(i),B_(i)):i=0, . . . , n} and channel output process {B_(i):i=0, .. . , n} are first-order Markov processes.

An example characterization of the finite block length feedbackcapacity, in this example case is

$C_{A^{n}\rightarrow B^{n}}^{{FB},{UMCI}}\mspace{14mu} \mspace{14mu} \sup\limits_{\{{{{{P_{A_{i}|A_{i - 1}}{({{{da}_{i}|a_{i - 1}},b_{i - 1}})}}\text{:}i} = 0},1,\ldots,n}\}}\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {{I\left( {A_{i - 1},{A_{i};\left. B_{i} \middle| B_{i - 1} \right.}} \right)}.{where}}}$${{I\left( {A_{i - 1},{A_{i};\left. B_{i} \middle| B_{i - 1} \right.}} \right)} = {\int{{\log \left( {\frac{{dP}_{{B_{i}|A_{i}},A_{i - 1}}\left( {{\cdot \left| a_{i} \right.},a_{i - 1}} \right)}{{dP}_{B_{i}|B_{i - 1}}\left( {\cdot \left| b_{i - 1} \right.} \right)}\left( b_{i - 1} \right)} \right)}{{P_{{B_{i}|A_{i}},A_{i - 1}}\left( {\left. {db}_{i} \middle| a_{i} \right.,a_{i - 1}} \right)} \otimes {P_{{B_{i}|A_{i}},B_{i - 1}}\left( {\left. {da}_{i} \middle| a_{i} \right.,b_{i - 1}} \right)} \otimes {P_{{B_{i}|A_{i}},B_{i - 1}}\left( {\left. {da}_{i} \middle| a_{i} \right.,{db}_{i - 1}} \right)}}}}},{1 = 0},1,\ldots,n,{and}$P_(B_(i)|B_(i − 1))(db_(i)|b_(i − 1)) = ∫P_(B_(i)|A_(i − 1), A_(i))(db_(i)|a_(i − 1), a_(i)) ⊗ P_(A_(i)|A_(i − 1), B_(i − 1))(da_(i)|a_(i − 1), b_(i − 1)) ⊗ P_(A_(i − 1)|B_(i − 1))(da_(i − 1)|b_(i − 1)), i = 0, 1, …, n.

B.9. Example Characterizations of Capacities and Identification ofChannel Input Distributions for MIMO ANonGN Channels with Memory

The two step procedure (described with reference to FIGS. 2, 3, 4, and5) may also allow computing devices and/or operators of communicationsystems to characterize generalizations of Additive Gaussian Noise (AGN)channels and nonstationary Multiple-Input Multiple Output (MIMO)Additive Non-Gaussian Noise (ANonGN) channels. These other types ofchannels may be defined by the example model:

${B_{i} = {{\sum\limits_{j = 1}^{L}\; {D_{i,j}A_{i - j}}} + V_{j}}},{i = 0},1,\ldots,n,{{\frac{1}{n + 1}E\left\{ {\sum\limits_{i = 0}^{n}\; {\gamma_{i}\left( {A_{i - L}^{i},B^{i - 1}} \right)}} \right\}} \leq \kappa}$

where {V_(i):i=0, 1, . . . , n} is p-dimensional nonstationarynon-Gaussian distributed noise, {A_(i): i=0, 1, . . . , n} areq-dimensional channel input processes, and a condition “A^(n) iscausally related to V^(n)” is represented, in the example model, by

$\begin{matrix}{{P_{A^{n},V^{n}}\left( {{a^{n}},{v^{n}}} \right)} = {\otimes_{i = 0}^{n}\left( {{P_{{A_{i}|A^{i - 1}},V^{i - 1}}\left( {\left. {a_{i}} \middle| a^{i - 1} \right.,v^{i - 1}} \right)} \otimes {P_{{V_{i}|V^{i - 1}},A^{i}}\left( {\left. {v_{i}} \middle| v^{i - 1} \right.,a^{i}} \right)}} \right)}} \\{= {\otimes_{i = 0}^{n}{\left( {{P_{{A_{i}|A^{i - 1}},V^{i - 1}}\left( {\left. {a_{i}} \middle| a^{i - 1} \right.,v^{i - 1}} \right)} \otimes {P_{{V_{i}|V^{i - 1}},A_{i - L}^{i}}\left( {\left. {v_{i}} \middle| v^{i - 1} \right.,a_{i - L}^{i}} \right)}} \right).}}}\end{matrix}$

The channel conditional distribution of the example nonstationary MIMOANonGN channel is

${{\mathbb{P}}\left\{ {\left. {B_{i} \leq b_{i}} \middle| B^{i - 1} \right.,A^{i}} \right\}} = {{\mathbb{P}}\left\{ {\left. {V_{i} \leq {b_{i} - {\sum\limits_{j = 1}^{L}\; {D_{i,j}A_{i - j}}}}} \middle| V^{i - 1} \right.,A_{i - L}^{i}} \right\}}$or${{\mathbb{P}}\left\{ {B_{i} \leq b_{i}} \middle| B^{i - 1} \right\}} = {\int_{b_{i - L}^{i}}{{\mathbb{P}}{\left\{ {\left. {V_{i} \leq {b_{i} - {\sum\limits_{j = 1}^{L}\; {D_{i,j}a_{i - j}}}}} \middle| V^{i - 1} \right.,a_{i - L}^{i}} \right\} \otimes {P_{{A_{i}|A_{i - L}^{i - 1}},V^{i - 1}}\left( {\left. {da}_{i} \middle| a_{i - L}^{i - 1} \right.,V^{i - 1}} \right)} \otimes {{P_{A_{i - L}^{i - 1}|V^{i - 1}}\left( {da}_{i - L}^{i - 1} \middle| V^{i - 1} \right)}.}}}}$

An example characterization of the finite block length feedback capacityfor the MIMO Additive Non-Gaussian Noise channels with memory may beexpressed as:

${C_{A^{n};B^{n}}^{{FB},{ANonGN}}(\kappa)} = {\sup\limits_{\{{{P_{{A_{i}|A_{i - L}^{i - 1}},v^{i - 1}}{({{a_{i}|a_{i - L}^{i - 1}},v^{i - 1},})}},{i = 0},\ldots,{{n\text{:}E{\{{\Sigma_{i = 0}^{n}{\gamma_{i}{({A_{i - L}^{i}B^{i - 1}})}}}\}}} \leq \kappa}}\}}\left\{ {{\sum\limits_{i = 0}^{n}\; {H\left( B_{i} \middle| B^{i - 1} \right)}} - {H\left( {\left. V_{i} \middle| V^{i - 1} \right.,A_{i - L}^{i}} \right)}} \right\}}$

where the transition probability distribution of the channel outputprocess {B_(i):i=0, 1, . . . , n} is given by the above mentioned model.

If the noise process is non-Gaussian with conditional distribution:

{P _(V) _(i) _(|V) _(i−1) _(,A) _(i) =P _(V) _(i) _(|V) _(i−L) _(i−1)_(,A) _(i−L) _(i) :i=0,1, . . . ,n},

and the instantaneous tranmission cost is:

γ_(i)(a _(i−L) ^(i−1) ,b ^(i−1))

γ_(i) ¹(a _(i−L) ^(i) ,b _(i−L) ^(i−1)), i=0, . . . ,n.

then another example characterization of the finite block lengthfeedback capacity for a channel in this class of channels is

${C_{W;B^{n}}^{{FB},{AnonGN},L}(\kappa)} = {\sup\limits_{\{{{P_{{A_{i}|A_{i - L}^{i - 1}},v_{i - L}^{i - 1}}{({{a_{i}|a_{i - L}^{i - 1}},v_{i - L}^{i - 1}})}},{i = 0},\ldots,{{n\text{:}E{\{{\Sigma_{i = 0}^{n}{\gamma_{i}^{1}{({A_{i - L}^{i - 1},B_{i - L}^{i - 1}})}}}\}}} \leq \kappa}}\}}\left\{ {{\sum\limits_{i = 0}^{n}\; {H\left( B_{i} \middle| B^{i - 1} \right)}} - {H\left( {\left. V_{i} \middle| V_{i - L}^{i - 1} \right.,A_{i - L}^{i}} \right)}} \right\}}$where${{\mathbb{P}}\left\{ {B_{i} \leq b_{i}} \middle| B^{i - 1} \right)} = {\int_{_{i}}{{\mathbb{P}}{\left\{ {\left. {V_{i} \leq {b_{i} - {\sum\limits_{j = 1}^{L}\; {D_{i,j}a_{i - j}}}}} \middle| V_{i - L}^{i - 1} \right.,A_{i - L}^{i - 1}} \right\} \otimes P_{{A_{i}|A_{i - L}^{i - 1}},V_{i - L}^{i - 1}}}{\quad{{\left( {\left. {da}_{i} \middle| a_{i - L}^{i - 1} \right.,V_{i - L}^{i - 1}} \right) \otimes {P_{A_{i - L}^{i - 1}|V_{i - L}^{i - 1}}\left( a_{i - L}^{i - 1} \middle| V_{i - L}^{i - 1} \right)}},{i = 0},1,\ldots,{n.}}}}}$

In another example, the noise process is Gaussian with conditionaldistribution:

{P _(V) _(i) _(|V) _(i−1) _(,A) _(i) =P _(V) _(i) _(|V) _(i−1) _(,A)_(i−L) _(i) :i=0,1, . . . ,n},

and the instantaneous transmission cost function is:

γ_(i)(a _(i−L) ^(i−1) ,b ^(i−1))

γ_(i) ¹(a _(i−L) ^(i) ,b _(i−L) ^(i−1)), i=0, . . . ,n.

In this case, the optimal channel input distribution is:

P _(A) _(i) _(|A) _(i−L) _(i−1) _(,V) _(i−1) *(a _(i) |a _(i−L) ^(i−1),v ^(i−1)), i=0, . . . ,n

and it is conditionally Gaussian. A Gaussian process

{A _(i) ^(g) :i=0,1, . . . ,n}

realizes this distribution, where

${A_{i}^{g} = {{\sum\limits_{j = 1}^{L}\; {\Gamma_{i,j}^{1}A_{i - j}^{g}}} + {\sum\limits_{j = 0}^{i - 1}\; {\Gamma_{i,j}^{2}V_{j}}} + U_{i}}},{i = 0},1,\ldots,{n.}$

That is, at each time i=0, 1, . . . n the Gaussian process is a linearcombination of {A_(i−L) ^(g,i−1),V^(i)} and Gaussian random variables.

In yet another example, the noise process is Gaussian, and satisfies

{P _(V) _(i) _(|V) _(i−1) _(,A) _(i) =P _(V) _(i) _(|V) _(i−L) _(i−1)_(,A) _(i−L) _(i) :i=0,1, . . . ,n},

In this case, the optimal channel input distribution is:

P _(A) _(i) _(|A) _(i−L) _(i−1) _(,V) _(i−L) _(i−1) *(a _(i) |a _(i−L)^(i−1) ,v _(i−L) ^(i−1)), i=0, . . . ,n

and it is conditionally Gaussian. A Gaussian process

{A _(i) ^(g) :i=0,1, . . . ,n}

realizes this distribution, where

${A_{i}^{g} = {{\sum\limits_{j = 1}^{L}\; {\Gamma_{i,j}^{1}A_{i - j}^{g}}} + {\sum\limits_{j = 1}^{L}\; {\Gamma_{i,j}^{2}V_{i - j}}} + U_{i}}},{i = 0},1,\ldots \mspace{14mu},n$

In still another example, the noise process is scalar Gaussian, A^(n) iscausally related to and defined by:

{P _(V) _(i) _(|V) _(i−1) _(,A) _(i) =P _(V) _(i) _(|V) _(i−1) :i=0,1, .. . ,n}

and the instantaneous tranmission cost function is)

γ_(i)(a _(i−L) ^(i−1) ,b ^(i−1))

γ(a _(i)), i=0,1, . . . ,n.

In this case, the Gaussian process

{A _(i) ^(g) :i=0,1, . . . ,n}

defined by:

${A_{i}^{g} = {{\sum\limits_{j = 0}^{i - 1}\; {\Gamma_{i,j}^{2}V_{j}}} + U_{i}}},{i = 0},1,\ldots \mspace{14mu},{n.}$

is a realization of the optimal channel input distribution. Further, if

{P _(V) _(i) _(|V) _(i−1) =P _(V) _(i) _(|V) _(i−L) _(i−1) :i=0,1, . . .,n}

is stationary, the Gaussian process realization further reduces to:

${A_{i}^{g} = {{\sum\limits_{j = 1}^{L}\; {\Gamma_{j}^{2}V_{i - j}}} + U_{i}}},{i = 0},1,\ldots \mspace{14mu},n$

B.10. Necessary and Sufficient Conditions for Feedback not to IncreaseCapacity

The example two-step procedure (described with reference to FIGS. 2, 3,4, and 5) may allow computing device and/or operators or communicationsystems to determine necessary and sufficient conditions for feedbackencoding to not increase capacity for a channel with memory, such as thechannel 106. For such channels, feedback encoding does not increasecapacity and the characterization of finite block length capacity withand without feedback is the same. Further, the capacity with and withoutfeedback is the same in these cases. Example characterizations of thefinite block length capacity without feedback and the capacity withoutfeedback, developed according to the two-step procedure are describedbelow along with necessary and sufficient conditions for feedbackencoding not to increase channel capacity of channels with memory.

If an example channel has memory and an instantaneous transmission costconstraint:

_([0,9])(

)

then

C _(A) _(n) _(;B) _(n) ^(noFB)(

)≦C _(A) _(n) _(→B) _(n) ^(FB)(

),

and feedback encoding does not provide additional gain compared toencoding without feedback if and only if the following identify holds:

C _(A) _(n) _(→B) _(n) ^(FB)(

)=C _(A) _(n) _(;B) _(n) ^(noFB)(

)

Further, feedback encoding does not increase capacity without feedbackif and only if:

${\liminf\limits_{n\rightarrow\infty}\frac{1}{n + 1}{C_{A^{n}\rightarrow B^{n}}^{FB}(\kappa)}} = {\liminf\limits_{n\rightarrow\infty}\frac{1}{n + 1}{C_{A^{n};B^{n}}^{noFB}(\kappa)}}$

where the limits are finite.

Next, further example notation is introduced. Specifically, let

$\begin{matrix}{{I\left( {A^{n};B^{n}} \right)} = {\sum\limits_{i = 0}^{n}\; {E\left\{ {\log \left( {\frac{{dP}_{{B_{i}|B^{i - 1}},A^{i}}\left( {{\cdot \left| B^{i - 1} \right.},A^{i}} \right)}{{dP}_{B_{i}|B^{i - 1}}^{npFB}\left( {\cdot \left| B^{i - 1} \right.} \right)}\left( B_{i} \right)} \right)} \right\}}}} \\{{\equiv {_{A^{n};B^{n}}\left( \left\{ {P_{A_{i}|A^{i - 1}}^{npFB},{{P_{B_{i}|{B^{i - 1}.A^{i}}}\text{:}i} = 0},1,\ldots \mspace{14mu},n} \right\} \right)}}}\end{matrix}$

where I(A^(n);B^(n)) is a functional of the channel distribution and thechannel input distribution without feedback denoted by

{P _(A) _(i) _(|A) _(i−1) ^(noFB) :i=0,1, . . . ,n}ε

_([0,n])(

)

The maximum information structure without feedback, in this examplenotation, is

{a ^(i−1) }, i=0,1, . . . ,n.

Also, let

$\begin{matrix}{{I\left( A^{n}\rightarrow B^{n} \right)} = {\sum\limits_{i = 0}^{n}\; {E\left\{ {\log \left( {\frac{P_{{B_{i}|B^{i - 1}},A^{i}}\left( {{\cdot \left| B^{i - 1} \right.},A^{i}} \right)}{P_{B_{i}|B^{i - 1}}^{FB}\left( {\cdot \left| B^{i - 1} \right.} \right)}\left( B_{i} \right)} \right)} \right\}}}} \\{{\equiv {_{A^{n}\rightarrow B^{n}}\left( \left\{ {P_{{A_{i}|A^{i - 1}},B^{i - 1}}^{FB},{{P_{{B_{i}|B^{i - 1}},A^{i}}\text{:}i} = 0},1,\ldots \mspace{14mu},n} \right\} \right)}}}\end{matrix}$

That is, I(A^(n)→B^(n)) is a functional of the channel distribution andthe channel input distribution with feedback denoted by

{P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) ^(FB) :i=0,1, . . . ,n}ε

_([0,n])(

)

The maximum information structure without feedback, in this examplenotation, is

{a ^(i−1) ,b ^(i−1) }, i=0,1, . . . ,n.

Using this example notation, for a channel with memory and an encoderwith a transmission cost constraint, the finite block length capacitywithout feedback with transmission cost is:

C A n ; B n npFB  ( κ ) = sup { P A i ; A i - 1 noFB  ( da i | a i - 1)  :  i = 0 , … , n } ∈ [ 0 , n ]  ( κ )   A n → B n  ( { P A i |A i - 1  mB i - 1 FB , P B i | B i - 1 , A i  :  i = 0 , 1 , …  , n} )

and, similarly, the finite block length capacity with feedback withtransmission cost is:

C A n → B n FB  ( κ ) = sup { P A i | A i - 1 , B i - 1 FB  ( da i | ai - 1 , b i - 1 )  :  i = 0 , … , n } ∈ [ 0 , n ]  ( κ )   A n → Bn  ( { P A i | A i - 1 , B i - 1 FB , P B i | B i - 1 , A i  :  i = 0, 1 , …  , n } )

Also, define a set satisfying conditional independence as

_([0,n]) ^(CI)

{P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) ^(FB)(da _(i) |a ^(i−1) ,b ^(i−1)=P _(A) _(i) _(|A) _(i−1) ^(noFB)(da _(i) |a ^(i−1))−a.a.(a ^(i−1) ,b^(i−1)):i=0, . . . ,n}

The characterization of finite block length capacity with feedback isequal to the characterization of finite block length capacity withoutfeedback if and only if the corresponding optimal channel inputdistribution of the former belongs to the above set. Thus, C_(A) _(n)_(;B) _(n) ^(noFB)(

) and C_(A) _(n) _(→B) _(n) ^(FB) (

) are related by

C A n ; B n noFB  ( κ ) = sup { P A i | A i - 1 , B i - 1 FB  ( da i |a i - 1 , b i - 1 )  :  i = 0 , , n } ∈ [ 0 , n ]  ( κ ) ⋂ [ 0 , n ]CI   A n → B n  ( { P A i | A i - 1 , B i - 1 FB , P B i | B i - 1 ,A i  :  i = 0 , 1 , …  , n } )

For a memoryless channel, this condition holds because the optimalchannel input distribution which corresponds to the characterization offinite block length feedback capacity satisfies

P _(A) _(i) _(|A) _(i−L) _(,B) _(i−1) ^(FB)(da _(i) |a ^(i−1) ,b^(i−1))=P _(A) _(i) (da _(i))−a.a.a ^(i−1) , i=0, . . . ,n

Also, the optimal channel input distribution which corresponds to thecharacterization of finite block length capacity without feedbacksatisfies

P _(A) _(i) _(|A) _(i−1) ^(noFB)(da _(i) |a ^(i−1))=P _(A) _(i) (da_(i))−a.a.a ^(i−1) , i=0, . . . ,n

For any example channel in classes A, B, and C and any instantaneoustransmission cost function in classes A, B, and C, let

{P(db _(i)|

_(i) ^(Q)):

⊂{a ^(i−1) ,b ^(i−1) }, i=0, . . . ,n}

denote the channel distribution and let

{P ^(FB,)*(da _(i)|

_(i) ^(FB)):i=0,1, . . . ,n}ε

_([0,n]) ^(FB)(

)⊂

_([0,n])(

),

_(i) ^(FB) ⊂{a ^(i−1) b ^(i−1) }, i=0,1, . . . ,n}

denote the channel input distribution corresponding to thecharacterization of the finite block length feedback capacity C_(A) _(n)_(→B) _(n) ^(FB) (

). With these example definitions, feedback does not increase the finiteblock length capacity without feedback (i.e., C_(A) _(n) _(;B) _(n)^(noFB)(

)=C_(A) _(n) _(;B) _(n) ^(FB)(

)) if and only if there exists a channel input distribution (withoutfeedback)

{P* ^(,noFB)(da _(i)|

_(i) ^(noFB)):i=0,1, . . . ,n},

_(i) ^(noFB) ⊂{a ^(i−1) }, i=0,1, . . . ,n,

which induces the joint distribution

P _(A) _(n) _(,B) _(n) ^(FB,)*(da ^(n) ,db ^(n))

and the channel output distribution

P _(B) _(n) ^(FB,)*(db ^(n))

corresponding to the pair

{P ^(FB,)*(da _(i)|

_(i) ^(FB)),P(db _(i)|

_(i) ^(Q)):i=0, . . . ,n}.

C. Designing Information Lossless Encoders to Achieve CharacterizedCapacities

In addition to characterizing the “finite block length” feedbackcapacity for channels, such as the channel 106, the techniques of thepresent disclosure include methods to design encoders that achievecharacterized capacities of channels with memory. These capacityachieving encoders are “information lossless encoders,” in that themapping, implemented by the encoders, of information from a source toencoded symbols are sequentially invertible. Encoders violating thisproperty are not able to achieve the capacity of a given channel.

Further, for each of the example channels (e.g., A, B, and C) discussedherein, a computing device and/or operator of a communication system maygenerate specific coding schemes or mappings for the “informationlossless” encoder based on the characterization of the capacity for thechannel and a corresponding optimal channel input distribution. That is,all capacity achieving encoders may be “information lossless” encoders,but, for a given channel, a computing device or operator may generate aspecific “information lossless” coding scheme based on acharacterization of capacity for that channel. A computing device and/oroperator may also define additional conditions of the informationlossless encoder for a specific channel based on a characterization ofcapacity for the specific channel.

The optimal (i.e., “information lossless”) encoding schemes designedaccording to the method discussed below may reduce the complexity ofcommunication systems and operate optimally. For example, optimaloperation may include an optimal operation in terms of the overallnumber of processing elements (e.g., CPUs) required to processtransmissions and/or the number of memory elements and steps required toencoder and decode messages. Such optimal encoding and decoding schemesmay require small processing delays and short code lengths in comparisonto encoding and decoding schemes designed based on an assumption of achannel without memory or based on a separate treatment of source codesand channel codes.

Although, encoders and corresponding necessary and sufficient conditionsdiscussed below are described, by way of example, with reference to theexample communication system 100, which system 100 is a point-to-pointcommunication system, encoders of the present disclosure may be appliedto or implemented in systems other than point-to-point communicationsystems. For example, encoders of the present disclosure may beimplemented in multi-user and network communication systems by repeatingthe procedures described below (for each user, for each node of anetwork, etc) Implementations of encoders may even be utilized in jointcollaborative communication.

C.1. Encoder Design Overview and Methods

FIG. 6 illustrates an example process 600 for designing an informationlossless encoder for a particular channel, such as the channel 106. Theinformation lossless encoder 602 designed according to the process 600may achieve capacities, such as finite block length feedback capacities,feedback capacities, finite block length feedback capacities withtransmission cost, feedback capacities with transmission cost, finiteblock length capacities without feedback, capacities without feedback,finite block length capacities without feedback and with transmissioncost, and capacities without feedback and with transmission cost. Thesetypes of capacities are further described in sections B.2. and B.3.entitled “Capacity Without Feedback” and “Capacity With Feedback,”respectively. A computing device, such as the computing deviceillustrated in detail in FIG. 15, may be specially and/or specificallyconfigured (e.g., by one or more algorithms, routines, modules, orengines) to implement at least a portion of the process 600. Further,components of the example communication system 100, such as the encoder104, may be configured according to the output of the process 600.

In the process 600, an information lossless condition 604, acharacterization of capacity 606 (e.g., finite block length feedbackcapacity), and an optimal channel input distribution 608 are input intoan encoder design procedure 610. The characterization of capacity 606and the optimal channel input distribution 608 may be generatedaccording to an implementation of the two-step procedure (described withreference to FIGS. 2, 3, 4, and 5). The information lossless condition604 may be defined based on a desired inevitability of the resultinginformation lossless encoder 602. Specifically, the information losslesscondition 604 may be defined, for a give class of channels, in terms ofdirected information measures, as discussed in further detail below withreference to example classes of channels.

The encoder design procedure 610 may, based on the information losslesscondition 604, the characterization of capacity 606, and the optimalchannel input distribution 608, generate the information losslessencoder 602. The information lossless encoder 602 may be utilized in animplementation of the system 100 as encoder 104, for example. Also,although not emphasized in the below description, decoders correspondingto the information lossless encoder 602 may also be generated by theencoder design procedure 610 based on the information losslesscondition. In this manner, computing devices and/or operators ofcommunication systems may design encoding and decoding processing for acommunication system, such as the system 100.

Generally, an encoding process may encode received symbols x^(n) from asource, such as the source 102, into channel input symbols a^(n)

{a₀, a₁, . . . , a_(n)}, a_(j)εX_(j), where j=0, 1, . . . , n, asfurther discussed with reference to FIG. 1. FIG. 7 illustrates such aprocedure. In source information 702, such as sampled speech signals,digital representations of photographs, etc., may be encoded by anencoder 706 to produce encoded symbols 704. As illustrated by the arrowsin FIG. 7, this encoder 706 implements an encoding scheme that isinvertible. That is, the encoded symbols 704 may be mapped back to thesource information 702, and, thus, no information is lost (i.e., theencoder is “information lossless”).

Although FIG. 7 illustrates certain information, symbols, and mapping byway of example, implementations of information lossless encoders mayencode any suitable types and formats of data for transmission over achannel other than those illustrated in FIG. 7. Further, encodingschemes may map any number of received symbols from a source to anynumber of encoded symbols.

FIG. 8 is a flow diagram of an example method 800 for designing acapacity achieving and information lossless encoder. A computing device,such as the computing device illustrated in detail in FIG. 15, may bespecially and/or specifically configured to implement at least a portionof the method 800. Further, in some implementations, a suitablecombination of a computing device and an operator of a communicationsystem, such as the communication system 100, may implement the method800.

In the method 800, an information lossless condition is determined for aclass of channels (block 802). For example, a computing device oroperator may determine an information lossless condition for one or moreof the classes of channel described in section B.1. entitled“Characterizing Channels.” The determination of the information losslesscondition may include utilizing a general definition of informationlossless encoders along with properties of a channel defined in achannel model to generate one or more specific information losslessconditions for a specific class of channels. Example determinationsutilizing this procedure are discussed further below for examplechannels in classes A, B, and C (as defined in described in sectionB.1).

The method 800 may also include receiving a characterization of channelcapacity and a corresponding channel input distribution (block 804). Thecharacterization of channel capacity and corresponding inputdistribution may correspond to the class of channels associated with theinformation lossless conditions determined at block 802. Thecharacterization or formula for the capacity may be a characterizationor formula for a finite block length feedback capacity, feedbackcapacity, finite block length feedback capacity with transmission cost,feedback capacity with transmission cost, finite block length capacitywithout feedback, capacity without feedback, finite block lengthcapacity without feedback and with transmission cost, and capacitywithout feedback and with transmission cost. Further, thecharacterization of channel capacity and corresponding channel inputdistribution may be an output from the two-step procedure (describedwith reference to FIGS. 2, 3, 4, and 5).

A computer device and/or operator of a communication system may thenutilize the information lossless condition(s), characterization ofchannel capacity, and channel input distribution to determine anencoding scheme (block 806). That is, the computer device and/oroperator may design the encoding scheme based on both properties of thechannel (e.g., capacity and optimal channel input distribution) andnecessary and sufficient conditions for any encoder to achieve thecapacity of a channel with memory. Certain of these optimal or capacityachieving encoding schemes for specific classes of channels arediscussed in sections B.6.1.2.2., B.6.2.2., and B.7.1.2.

C.2. Example Information Lossless Conditions

For further clarification and by way of example, the section belowinclude necessary and sufficient conditions for any encoder of exampleclasses A and B to be information lossless. Based on these conditionsand based on characterizations of capacities and optimal channel inputdistributions, computing devices and/or operators may design encodersfor transmission of information over channels in the example classes Aand B. The example corresponding to class A includes an encoder withfeedback and the example corresponding to class B includes an encoderwithout feedback.

C.2.1. Feedback Encoder Corresponding to Example Class a

A feedback encoder corresponding to the example class A (e.g., class Achannels) may be referred to herein as

e ^(n)ε

_([0,n])

The information structure entering the example encoder at any time i maybe expressed as {a^(i−1), x^(i),b^(i−1)}.

By substituting a^(i−1) recursively into the right side of

a _(i) =e _(i)(a ^(i−1) ,x ^(i) ,b ^(i−1))

then

a _(i) =e _(i)(a ^(i−1) ,x ^(i) ,b ^(i−1))≡ē _(i)(x ^(i) ,b ^(i−1)),i=0, . . . ,n.

Thus, for any feedback encoder of example class A, the informationstructure of the encoder at each time instant i is:

_(i) ^(e)

{a ^(i−1) ,x ^(i) ,b ^(i−1) }≡{x ^(i) ,b ^(i−1) }, i=0, . . . ,n,

and this information structure is the most general classical informationstructure among all possible deterministic nonanticipative encoders withfeedback.

Given any feedback encoder of example Class A, an encoder and any sourceand channel disributions, the information from the source to the channeloutput is the directed information defined by

${{I\left( X^{n}\rightarrow B^{n} \right)}\mspace{14mu} \mspace{14mu} {\sum\limits_{i = 0}^{n}\; {I\left( {X^{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}}}\overset{(\alpha)}{=}{\sum\limits_{i = 0}^{n}\; {I\left( {X^{i},{A^{i};\left. B_{i} \middle| B^{i - 1} \right.}} \right)}}$

Also, given any encoder of Class A, the following chain rule ofconditional mutual information holds:

$\begin{matrix}{{I\left( {X^{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)} = {I\left( {X^{i},{A^{i};\left. B_{i} \middle| B^{i - 1} \right.}} \right)}} \\{{{= {{I\left( {A_{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)} + {I\left( {X^{i},\left. B_{i} \middle| B^{i - 1} \right.,A^{i}} \right)}}},}\;} \\{{i = 0},1,\ldots \mspace{14mu},n} \\{{{\geq {I\left( {A^{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}},{i = 0},1,\ldots \mspace{14mu},n}}\end{matrix}$

where the inequality is due to the nonnegativity of conditional mutualinformation. In fact, the following stronger version of this expressionholds:For any e^(n)ε

_([0,n]) then

${{\log \left( \frac{P_{{B_{i}|B^{i - 1}},X^{i}}}{P_{B_{i}|B^{i - 1}}} \right)} = {{\log \left( \frac{P_{{B_{i}|B^{i - 1}},A^{i}}}{P_{B_{i}|B^{i - 1}}} \right)}{+ \log}\left( \frac{P_{{B_{i}|B^{i - 1}},A^{i},X^{i}}}{P_{{B_{i}|B^{i - 1}},A^{i}}} \right)}},{i = 0},1,\ldots \mspace{14mu},{n.}$

A feedback encoder of example class A is “information lossless” withrespect to the directed information measures

I(X ^(n) →B ^(n)) and I(A ^(n) →B ^(n))

if

I(X ^(n) →B ^(n))=I(A ^(n) →B ^(n)), ∀e ^(n)ε

_([0,n]) ^(IL) ⊂

_([0,n]).

A sufficient condition for a feedback encoder of example Class A to be“information lossless” according to this definition is based on thefollowing conditional independence:

X ^(i)

(A ^(i) ,B ^(i−1))

B _(i) , i=1, . . . ,n.  MC1:

Given any feedback encoder of example class A, if

X ¹

(A ^(i) ,B ^(i−1))

B _(i) then I(X ^(i) ,B _(i) |B ^(i−1) ,A ^(i))=0,

and the following identify holds:

I(X ^(i) ;B _(i) |B ^(i−1))=I(A ^(i) ;B _(i) |B ^(i−1)), i=0,1, . . .,n.

Hence, any class of functions which induces MC1 or equivalently, inducesthe conditional independence on the sequence of channel conditionaldistributions

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) _(,X) _(i) =P _(B) _(i) _(|B) _(i−1)_(,A) _(i) , i=0,1, . . . ,n

is an information lossless class of functions.

Sufficient conditions (and also necessary) for any class of functions tobe an information lossless class, for a feedback encoder of exampleclass A channel, may be expressed as follows:

Class A Encoder is Information Lossless if

for fixed b⁻¹ε

⁻¹, e₀(Φ,b⁻¹):

₀

₀ is one-to-one, onto

₀,

and its inverse φ₀

e₀ ⁻¹(•,b⁻¹):

₀

₀ is measurable,

for fixed (a₀,x₀,b⁻¹,b₀)ε

₀×

₀×

⁻¹×

₀, e₁(a₀,x₀,•,b⁻¹,b₀):

₁

₁

is one-to-one onto

₁, and its inverse φ₁

e₁ ⁻¹(a₀,•,x₀,b⁻¹,b₀):

₁

₁ is measurable

for any i=2, 3, . . . , n,

for fixed (a^(i),x^(i−1),b^(i−1))ε

^(i)×

^(i−1)×

^(i−1)×

_(i), e_(i)(a^(i),x^(i−1),•,b^(i−1)):

_(i)

_(i) is one-to-one, onto

_(i), and its inverse φ_(i)

e_(i) ⁻¹(a^(i−1),•,x^(i−1),b^(i−1)):

_(i)

_(i) is measurable.

All of the examples of capacity achieving encoders with feedback,discussed in the above sections, satisfy these necessary and sufficientconditions and are, thus, information lossless

This class of Information Lossless encoders also satisfies the followingconditional independence:

A ^(i)

(X ^(i) ,B ^(i−1))

B _(i) , i=0,1, . . . ,n.  MC2:

Still further, the following stronger identity holds for the informationlossless encoders:

∀ e n ∈ [ 0 , n ] IL   then   log  ( P B i | B i - 1 , X i P B i |B i - 1 ) = log  ( P B i | B i - 1 , A i P B i | B i - 1 ) , i = 0 , 1, …  , n .

C.2.2. Encoders without Feedback Corresponding to Example Class B

An encoder without feedback corresponding to the example class B may bereferred to herein as

e ^(n)ε

_([0,n]) ^(nfb),

The information structure entering the example encoder without feedbackat any time i may be expressed as {a^(i−1),x^(i)}.

By substituting a^(i−1) recursively into the right side of

a _(i) =e _(i)(a ^(i−1) ,x ^(i))

then

a _(i) =e _(i)(a ^(i−1) ,x ^(i))≡ē _(i)(x ^(i)), i=0, . . . ,n.

Thus, for any encoder without feedback of example class B, theinformation structure of the encoder at each time instant i is:

_(i) ^(e,nfb)

{a ^(i−1) ,x ^(i) }≡{x ^(i) }, i=0, . . . ,n,

and this information structure is the most general classical informationstructure among all possible deterministic nonanticipative encoders.

Given any encoder without feedback of example Class B and any source andchannel distributions, the information from the source to the channeloutput is the mutual information defined by

${I\left( {X^{n};B^{n}} \right)} = {{\sum\limits_{i = 0}^{n}\; {I\left( {X^{n};\left. B_{i} \middle| B^{i - 1} \right.} \right)}} = {{\sum\limits_{i = 0}^{n}\; {I\left( {X^{i};\left. B_{i} \middle| B^{i - 1} \right.} \right)}}\overset{(\alpha)}{=}{\sum\limits_{i = 0}^{n}\; {I^{e}\left( {X^{i},{A^{i};\left. B_{i} \middle| B^{i - 1} \right.}} \right)}}}}$

An encoder without feedback of example class B is “information lossless”with respect to the directed information measures

I(X ^(n) ;B ^(n)) and I(A ^(n) ;B ^(n))

if

I(X ^(n) ;B ^(n))=I(A ^(n) ;B ^(n)), ∀e ^(n)ε

_([0,n]) ^(IL.nfb) ⊂

_([0,n]) ^(nfb).

Sufficient conditions (and also necessary) for any class of functions tobe an information lossless class, for an encoder without feedback ofexample class B, may be expressed as follows:

Class B Encoder is information Lossless if

e₀(•):

₀ is one-to-one, onto

₀,

and its inverse φ₀

e₀ ⁻¹(•)·

₀

₀ is measurable

for fixed (a₀,x₀)ε

₀×

₀, e₁(a₀,x₀,•):

₁

₁

is one-to-one, onto

₁, and its inverse φ₁

e₁ ⁻¹(a₀,•,x₀):

₁

₁ is measurable for any i=2, 3, . . . , n,

for fixed (a^(i),x^(i−1))ε

^(i)×

^(i−1), e_(i)(a^(i),x^(i−1),•):

_(i)

_(i) is one-to-one, onto

_(i), and its inverse φ_(i)

e_(i) ⁻¹(a^(i−1),•,x^(i−1)):

_(i)

_(i) is measurable.

Still further, the following stronger identity holds for the informationlossless encoders:

∀ e n ∈ [ 0 , n ] IL , nfb   then   log  ( P B i | B i - 1 , X i PB i | B i - 1 ) = log  ( P B i | B i - 1 , A i P B i | B i - 1 ) , i =0 , 1 , …  , n .

D. Compressing Information with Zero-Delay

The below-discussed compressions of information with zero-delay utilizea “nonanticipative” (e.g., zero-delay) rate distortion function (RDF).This function along with various other relevant quantities are definedand discussed below before specifying a number of example compressionschemes. Also, in the following discussion, a “non-stationary” sourcemay refer to a source of information, such as the source 102, thatvaries in time.

D.1. Nonanticipative RDF of Non-Stationary Sources

Generally, an RDF may define an manner in which data is to be sent overa channel, such as the channel 106. For example, an RDF may define anumber of bits per symbol of information that should be sent over achannel. The manner in which data is sent over a particular channel maybe optimal or not depending on the particular RDF that definescompression for the particular channel. For example, to achieve acapacity, such as the capacities discussed in section B, a compressionof information may utilize a nonanticipative RDF, as discussed below.

A nonanticipative RDF may be defined in terms of a “sourcedistribution,” a “reproduction distribution,” and a “fidelity ofreproduction,” in an implementation. The source distribution may be acollection of conditional probability distributions:

{P _(X) _(t) _(|X) _(t−1) (dx _(t) |x ^(t−1)):t=0,1, . . . ,n}.

The reproduction distribution may also be a collection of conditionalprobability distributions:

{P _(Y) _(t) _(|Y) _(t−1) _(,X) _(t) (dy _(t) |y ^(t−1) ,x ^(t)):t=0,1,. . . ,n}.

Also, to express the nonanticipative RDF, the following family of causalconditional distributions are defined:

{right arrow over (P)} _(Y) _(n) _(|X) _(n) (dy ^(n) |x ^(n))

_(t=0) ^(n) P _(Y) _(i) _(|Y) _(t−1) _(,x) _(t) (dy _(t) |y ^(t−1) ,x^(t)).

Given the source distribution and reproduction distribution, the “jointdistribution” is given by:

P _(X) _(n) _(,Y) _(n) (dx ^(n) ,dy ^(n))

P _(X) _(n) (dx ^(n))

{right arrow over (P)} _(Y) _(n) _(|X) _(n) (dx ^(n) ,dy ^(n)),

and the “marginal distributions” are given by:

P _(Y) _(n) (dy ^(n))

∫_(X) _(n) P _(X) _(n) (dx ^(n))

{right arrow over (P)} _(Y) _(n) _(|X) _(n) (dy ^(n) |x ^(n)).

The distortion function of reproducing x_(t) by y_(t), for t=0, 1, . . ., n, may be a measurable function:

d _(0,n):

_(0,n)×

_(0,n)

[0,∞]d _(0,n)(x ^(n) ,y ^(n))=Σ_(t=0) ^(n)ρ_(t)(T ^(t) x ^(n) ,T ^(t) y^(n)),

where,

T ^(t) x ^(n) ⊂{x ₀ ,x ₁ , . . . ,x _(t) },T ^(t) y ^(n) ⊂{y ₀ ,y ₁ , .. . ,y _(t) }, t=0,1, . . . ,n.

The fidelity set of reproduction conditional distributions is thendefined by:

0 , n  ( D )   { Y n  X n  ( ·  x n )  :   1 n + 1  ∫ X 0 , n× Y 0 , n    0 , n  ( x n , y n )  P X n , Y n  (  x n ,  y n )≤ D }

where D≧0.

The information measure of the nonanticipative RDF may be a special caseof directed information defined by:

${{I_{P_{X^{n}}}\left( {X^{n}->Y^{n}} \right)}{\int{{\log\left( \frac{{\overset{->}{P}}_{Y^{n}X^{n}}\left( {{y^{n}}x^{n}} \right)}{P_{Y^{n}}\left( {y^{n}} \right)} \right)}{P_{X^{n},Y^{n}}\left( {{x^{n}},{y^{n}}} \right)}}}} \equiv {{_{X^{n}->Y^{n}}\left( {P_{X^{n}},{\overset{->}{P}}_{Y^{n}X^{n}}} \right)}.}$

The finite time nonanticipative RDF may be defined by:

${R_{0,n}^{na}(D)}{{_{X^{n}->Y^{n}}\left( {P_{X^{n}},{\overset{->}{P}}_{Y^{n}X^{n}}} \right)}.}$

and the nonanticipative RDF rate may be defined by:

${R^{na}(D)} = {\lim\limits_{n->\infty}{\frac{1}{n + 1}{R_{0,n}^{na}(D)}}}$

This RDF function may specify that R^(na)(D) bits/symbol are to betransmitted over a channel, such as the channel 106, such that thedistortion does not exceed D. The distortion D may be represented by anysuitable distortion measure, such as the Hamming distortion measure, thesquared-error distortion measure, etc.D.2. Methods for Compressing Information with Zero-Delay

FIG. 9 is a flow diagram of an example method 900 for compressinginformation with zero-delay (e.g., for use in real-time communications).A computing device, such as the computing device illustrated in detailin FIG. 15, may be specially and/or specifically configured to implementat least a portion of the method 900. Further, in some implementations,a suitable combination of a computing device, an operator of acommunication system, such as the communication system 100, and one ormore other suitable components of a communication system (e.g., routers,modems, gateways, etc.) may implement the method 900.

In the method 900, a computing device and/or operator of acommunications system determines a nonanticipative rate distortionfunction (RDF) (block 902). The nonanticipative RDF may be a function,as described above in section D.1, that is not dependent on alltransmitted symbols over a channel. Rather, the nonanticipative RDF maybe causal in that the nonanticipative RDF only depends on previouslytransmitted information over a channel. In this manner the determinednonanticipative RDF is zero-delay.

In some implementations, a computer and/or operator of a communicationssystem may determine the nonanticipative RDF according to thedefinitions in section D.1 and further characterize the nonanticipativeRDF according to properties of the channel over which information is tobe transmitted. For example, a computer or operator may characterize thenonanticipative RDF according to a properties of an AWGN channel, asfurther discussed in various examples presented below. In any event, themethod 900 includes determining a nonanticipative RDF, where thenonanticipative RDF may be expressed in any suitable form including ageneral form for any suitable channel or a more specificcharacterization according to properties of a specific channel.

The method 900 also includes determining a rate of information transferbased on the RDF and an allowed amount of distortion (block 904). Insome implementations, the allowed amount of distortion may be a numberor expression representing an amount of distortion (e.g., Hamming orsquared-error distortion) of information transmitted over one or morechannels. The computer or operator of a communications system maydetermine the allowed amount of distortion based on a desiredperformance (e.g., efficiency) of transmitting information and/or basedon desired qualities of the received information after transmission. Forexample, for information representing speech, an operator of acommunication system may specify (e.g., by configuring a computer tocompress information) an allowed amount of distortion such thattransmitted speech signals are understandable to a human after beingdecoded.

The computer and/or operator may provide the allowed amount ofdistortion to the determined nonanticipative RDF (e.g., as input) todetermine the rate of information transfer, which rate corresponds tothe allowed amount of distortion. In other words, when the allowedamount of distortion is provided to the nonanticipative RDF, thenonanticipative RDF produces a corresponding rate of informationtransfer. If information is transferred over a channel at this rate, theinformation will be distorted (e.g., according to a squared-errordistortion) at a level at or below the allowed amount of distortion. Thecomputer and/or operator implementing portions of the method 900 mayexpress the rate as a number of bits per source symbol, a number ofbytes per source symbol, or any other suitable amount of data per symbolof source information.

In some implementations, the computer and/or operator may utilize abuffer or range in determining the rate of information transfer. Forexample, instead of utilizing one allowed amount of distortion, thecomputer and/or operator may determine a range of information transferrates (e.g., bits/symbol) based on an allowed range of distortions. Inother examples, the computer and/or operator may determine a rate ofinformation transfer based on a proxy amount of distortion, which proxydefines a buffer between an actual allowed amount of distortion and theproxy amount.

Still further, the method 900 includes compressing information from asource according to the determined rate (block 906). The computingdevice implementing at least this portion of block 900 may apply anynumber of suitable compression methods to compress information from asource of information, such as the source 102, such that the compressedinformation results in a rate (e.g., bits/symbol) at or less than therate determined at block 904. Example compression methods may include,by way of example, A-law algorithms, code-excited linear predictions,linear predictive coding, mu-law algorithms, block truncation coding,fast cosine transform algorithms, set partitioning in hierarchicaltrees, etc. Alternatively, a computing device may utilize a codingscheme designed according to the JSCC methods discussed herein tosimultaneously compress and encode information from a source.

The method 900 still further includes transmitting the compressedinformation at the determined rate (block 908). Once compressed,components of a communication system, such as the system 100, maytransmit the compressed information at the determined rate. In someimplementations, this may include further encoding of the compressedinformation, and, in other implementations utilizing JSCC, thecompressed information may already be encoded for optimal transmissionover a channel.

D.3. Closed Form Expressions for a Finite-Time Nonanticipative RDF

As discussed in section D.1, a finite-time (e.g., not taking an infinitenumber of transmissions) expression for the nonanticipative RDF mayinclude an infimum. Below, closed-form expressions are presented for anonstationary optimal reproduction conditional distribution, whichdistribution attains the infimum of the finite-time nonanticipative RDF,R_(0,n) ^(na)(D).

D.3.1. Optimal Reproduction Conditional Distribution for theNonanticipative RDF

If the infimum of RZ (D) is attained at:

{P _(Y) _(t) _(|Y) _(t−1) _(,X) _(i) *(•|y ^(t−1) ,x ^(t)), t=0, . . .,n}

then R_(0,n) ^(na) (D) satisfies the following backward in timerecursive equations:For t=n:

${P_{{Y_{n}Y^{n - 1}},X^{n}}^{*}\left( {{{y_{n}}y^{n - 1}},x^{n}} \right)} = {\frac{^{s\; {\rho_{n}{({{T^{n}x^{n}},{T^{n}y^{n}}})}}}{P_{Y_{n}Y^{n - 1}}^{*}\left( {{y_{n}}y^{n - 1}} \right)}}{\int_{_{n}}{^{s\; {\rho_{n}{({{T^{n}x^{n}},{T^{n}y^{n}}})}}}{P_{Y_{n}Y^{n - 1}}^{*}\left( {{y_{n}}y^{n - 1}} \right)}}}.}$

where s<0 is the Lagrange multiplier of the fidelity, andFor t=n−1, n−2, . . . , 0:

${P_{{Y_{t}Y^{t - 1}},X^{t}}^{*}\left( {{{y_{t}}y^{t - 1}},x^{t}} \right)} = \frac{^{{s\; {\rho_{t}{({{T^{t}x^{n}},{T^{t}y^{n}}})}}} - {g_{t,n}{({x^{t},y^{t}})}}}{P_{Y_{t}Y^{t - 1}}^{*}\left( {{y_{t}}y^{t - 1}} \right)}}{\int_{_{t}}{^{{s\; {\rho_{t}{({{T^{t}x^{n}},{T^{t}y^{n}}})}}} - {g_{t,n}{({x^{t},y^{t}})}}}{P_{Y_{t}Y^{t - 1}}^{*}\left( {{y_{t}}y^{t - 1}} \right)}}}$

where g_(t,n)(x^(t),y^(t)) is given by:

gt,n (x ^(t) , y ^(t))=−

P _(X) _(T+1) (dx_(t+1) |X ^(t))log

e^(sρt+1)(T^(t+1) x ^(n) , T ^(t+1) y ^(n))−g _(f+l,n)(x ^(t+l) , y^(t+1))P ^(*) _(Y) _(t+1|) _(Y) _(t) (dy _(t) |y ^(t)))

and the finite time nonanticipative RDF is given by:

${R_{0,n}^{na}(D)} = {{{sD}\left( {n + 1} \right)} - {\sum\limits_{t = 0}^{n}\; {\int_{X^{t} \times Y^{t - 1}}^{\;}{\left\{ \ {{\int_{_{t}}{{g_{t,n}\left( {x^{t},y^{t}} \right)}{P_{{Y_{t}Y^{t - 1}},X^{t}}^{*}\left( {{{y_{t}}y^{t - 1}},x^{t}} \right)}}} + {\log \left( {\int_{_{t}}{^{{s\; {\rho_{t}{({{T^{t}x^{n}},{T^{t}y^{n}}})}}} - {g_{t,n}{({x^{t},y^{t}})}}}{P_{Y_{t}Y^{t - 1}}^{*}\left( {{y_{t}}y^{t - 1}} \right)}}} \right)}} \right\} \otimes {P_{X_{t}X^{t - 1}}\left( {{x_{t}}x^{t - 1}} \right)} \otimes {{P_{X^{t - 1},Y^{t - 1}}\left( {{x^{t - 1}},{y^{t - 1}}} \right)}.}}}}}$

If R_(0,n) ^(na)(D)>0, then s<0, and

${\frac{1}{n + 1}\Sigma_{t = 0}^{n}{\int{\rho \; {t\left( {{T^{t}x^{n}},{T^{t}y^{n}}} \right)}{P_{X^{t}Y^{t}}\left( {{x^{t}},{y^{t}}} \right)}}}} = {D.}$

D.3.2. Information Structures of the Nonanticipative RDF

From the above expressions, given any source distribution, a computingdevice and/or operator of a communication system may identify adependence of the optimal nonstationary reproduction distribution onpast and present source symbols. However, the above expressions do notimmediately yield a dependence on past reproduction symbols, referred toherein as “information structures.” Regarding this dependence, thefollowing observations are presented:

(1) The dependence of P_(Y) _(n) _(|Y) _(n−1) _(,X) _(n)*(dy_(n)|y^(n−1),x^(n)) on x^(n)ε

_(0,n) is determined by the dependence of ρ_(n)(T^(n)x^(n),T^(n)y^(n))on x^(n)ε

_(0,n) as follows:(2) If ρ_(n)(T^(n)x^(n),T^(n)y^(n))={tilde over (ρ)}(x_(n),y^(n)) thenP_(Y) _(n) _(|Y) _(n−1) _(,X) _(n) *(dy_(n)|y^(n−1),x^(n))=P_(Y) _(n)_(|Y) _(n−1) _(,X) _(n) *(dy_(n)|y^(n−1),x_(n)), while for t=n−1, n−2, .. . , 0. the dependence of P_(Y) _(t) _(|Y) _(t−1) _(,X) _(t)*(dy_(t)|y^(t−1),x^(t)) on x^(i)ε

_(0,i) is determined by the dependence of ρ_(t)(x^(t),y^(t)) on x^(t)ε

_(0,t) and g_(t,n)(x^(t),y^(t)) on x^(t)ε

_(0,t).(3) If P_(X) _(t) _(|X) _(t−1) (dx_(t)|x^(t−1))=P_(X) _(t) _(|X) _(t−1)(dx_(t)|x_(t−1)) and ρ_(t)(T^(t)x^(n),T^(t)y^(n))={tilde over(ρ)}(x_(t),y_(t)) then P_(Y) _(t) _(|Y) _(t−1) _(,X) _(t)*(dy_(t)|y^(t−1),x^(t))=P_(Y) _(t) _(|Y) _(t−1) _(,X) _(t)*(dy_(t)|y^(t−1)),x_(t)).(4) If g_(t,n)(x^(t),y^(t))=g_(t,n)(x^(t),y^(t−1)), t=0, . . . , n−1,optimal reproduction distribution (IV.227) reduces to

${{P_{{Y_{t}Y^{t - 1}},X^{t}}^{*}\left( {{{y_{t}}y^{t - 1}},x^{t}} \right)} = \frac{^{s\; {\rho_{t}{({{T^{t}x^{n}},{T^{t}y^{n}}})}}}{P_{Y_{t}Y^{t - 1}}^{*}\left( {{y_{t}}y^{t - 1}} \right)}}{\int_{_{t}}{^{s\; {\rho_{t}{({{T^{t}x^{n}},{T^{t}y^{n}}})}}}{P_{Y_{t}Y^{t - 1}}^{*}\left( {{y_{t}}y^{t - 1}} \right)}}}},{t = 0},1,{\ldots \mspace{14mu} {n.}}$

D.3.3. Alternative Characterization of the Finite-Time NonanticipativeRDF

To further clarify the dependence of the optimal reproductiondistribution on past reproductions, an alternative characterization ofR_(0,n) ^(na)(D) may include a maximization over a certain class offunctions. A computing device and/or operator of a communications systemmay utilize this alternative characterization to derive lower bounds onR_(0,n) ^(na)(D), which bounds are achievable.

The alternative characterization may be expressed as:

${R_{0,n}^{na}(D)} = {\sup\limits_{s \leq 0}\sup\limits_{\{{{{\lambda_{t} \in {\Psi_{s}^{t}:t}} = 0},\ldots \mspace{14mu},n}\}}{\left\{ {{{sD}\left( {n + 1} \right)} - {\sum\limits_{= 0}^{n}\; {\int_{_{0,t} \times _{0,{t - 1}}}^{\;}\left\{ {\left. {\int_{_{t}}^{\;}{{g_{t,n}\left( {x^{t},y^{t}} \right)}P_{{Y_{t}|Y^{t - 1}},X^{t}}^{*}\ {y_{t}}}} \middle| \ y^{t - 1} \right.,x^{t}} \right)}} + {\log \left( {\lambda_{t}\left( {x^{t},y^{t - 1}} \right)} \right)}} \right\} \otimes {P_{X_{t}|X^{t - 1}}\left( {x} \middle| x^{t - 1} \right)} \otimes {P_{X^{t - 1},Y^{t - 1}}\left( {{x^{t - 1}},{y^{t - 1}}} \right)}}}$     where     Ψ_(s)^(t){λ_(t)(x^(t), y^(t − 1)) ≥ 0:  ∫_(_(0, t − 1)) (∫_(_(t))  ^(s ρ_(t)(T^(t)x^(n), T^(t)y^(n)) − g_(t, n)(x^(t), y^(t)))λ_(t)(x^(t), y^(t − 1))P_(X_(t)|X^(t − 1))(x_(t)|x^(t − 1))) ⊗ P_(X^(t − 1)|Y^(t − 1))(x^(t − 1)|y^(t − 1)) ≤ 1}  g_(n, n)(x^(n), y^(n)) = 0,      g_(t, n)(x^(t), y^(t)) = −∫_(_(t + 1)) P_(X_(t + 1)|X^(t))(x_(t + 1)|x^(t))log (λ_(t + 1)(x^(t + 1), y^(t)))⁻¹, t = 0, 1, …  , n − 1.

For sε(−∞,0|, a necessary and sufficient condition to achieve thesupremum of the above alternative characterization of R_(0,n) ^(na)(D)is the existence of a probability measure

P _(Y) _(t) _(|Y) _(t−1) *(dy _(t) |y ^(t−1))

such that

λ_(t)(x ^(i) , y ^(t−1))={

e ^(sρt(x) ^(t+1) ^(, y) ^(t+1))P^(*) _(Y) _(t) _(|Y) _(t)1(dy_(t)|y^(t−1))}⁻¹, t=0,1. . ., n,

and such that,

c(y^(t))∫_(_(0, t − 1)) (∫_(_(t)) ^(s ρ_(t)(T^(t)x^(n), T^(t)y^(n)) − g_(t, n)(x^(t), y^(t)))λ_(t)(x^(t), y^(t − 1)) ⊗ P_(X_(t)|X^(t − 1))(x_(t)|x^(t − 1))) ⊗ P_(X^(t − 1)|Y^(t − 1))(x^(t − 1)|y^(t − 1)) = 1, t = 0, 1, …  , n.  

The above alternative characterization of R_(0,n) ^(na)(D) may allow acomputing device and/or operator to compute R_(0,n) ^(na)(D) exactly(e.g., as part of the example method 600) for a given source withmemory.

D.4. Example Nonanticipative RDF for a Gaussian Source and OptimalTransmission Over an AWGN Channel

In an example scenario, the following expression describes ap-dimensional nonstationary Gaussian source process:

X _(t+1) =A _(t) X _(t) +B _(t) W _(t) , t=0,1, . . . ,n−1

where

A _(t)ε

^(p×p) ,B _(t)ε

^(p×k) , t=0,1, . . . , n−1.

For an Autoregressive Moving Average model with finite tap delays, theremay exist a state space representation for some p. For the followingexample analysis in this scenario, assume:(G1) X₀ε

^(p) is Gaussian N( x ₀, λ ₀):(G2) {W_(t):t=0, . . . , n} is a k-dimensional IID Gaussian N(0,_(k×k))sequence, independent of X₀:(G3) The distortion function is single letter defined byd_(0,n)(x^(n),y^(n))

Σ_(t=0) ^(n)∥x_(t)−y_(t)∥

.

The nonstationary optimal reproduction distribution may be given, fors≦0, by the following recursive equations:

${P_{{Y_{n}|Y^{n - 1}},X_{n}}^{*}\left( {\left. {y_{n}} \middle| y^{n - 1} \right.,x_{n}} \right)} = \frac{^{s{{y_{n} - x_{n}}}_{{\mathbb{R}}^{p}}^{2}}{P_{Y_{n}|Y^{n - 1}}^{*}\left( {y_{n}} \middle| y^{n - 1} \right)}}{\int_{_{n}}^{\;}{^{s{{y_{n} - x_{n}}}_{{\mathbb{R}}^{p}}^{2}}{P_{Y_{n}|Y^{n - 1}}^{*}\left( {y_{n}} \middle| y^{n - 1} \right)}}}$  g_(n, n)(x_(n), y^(n)) = 0${{P_{{Y_{t}|Y^{t - 1}},X_{t}}^{*}\left( {\left. {y_{t}} \middle| y^{t - 1} \right.,x_{t}} \right)} = \frac{^{{s{{y_{t} - x_{t}}}_{{\mathbb{R}}^{p}}^{2}} - {g_{t,n}{({x_{t},y^{t}})}}}{P_{Y_{t}|Y^{t - 1}}^{*}\left( {y_{t}} \middle| y^{t - 1} \right)}}{\int_{_{t}}^{\;}{^{{s{{y_{t} - x_{t}}}_{{\mathbb{R}}^{p}}^{2}} - {g_{t,n}{({x_{t},y^{t}})}}}{P_{Y_{t}|Y^{t - 1}}^{*}\left( {y_{t}} \middle| y^{t - 1} \right)}}}},{t = {n - 1}},{n - 2},\ldots \mspace{14mu},{{0\mspace{20mu} {g_{t,n}\left( {x_{t},y^{t}} \right)}} = {- {\int_{_{t + 1}}^{\;}{{P_{X_{t + 1}|X_{t}}\left( {x_{t + 1}} \middle| x_{t} \right)}{\log \left( {\int_{_{t + 1}}^{\;}{^{{s{{y_{t + 1} - x_{t + 1}}}_{{\mathbb{R}}^{p}}^{2}} - {g_{{t + 1},n}{({x_{t + 1},y^{t + 1}})}}}\  \otimes {P_{Y_{t + 1}|Y^{t}}^{*}\left( {y_{t + 1}} \middle| y^{t} \right)}}} \right)}}}}},{t = {n - 1}},{n - 2},\ldots \mspace{14mu},{0.}$

Thus, the optimal reproduction distributions may be conditionallyGaussian, and the optimal reproduction distributions may be realizedusing a general Gaussian channel with memory, modeled by:

Y _(t) =Ā _(t) X _(t) + B _(t) Y ^(t−1) +V _(t) ^(c) , t=0, . . . ,n

where

Ā _(t)ε

^(p×p) , B _(t)ε

^(p×tp), and {V _(t) ^(c) :t=0, . . . ,n}

are independent sequences of Gaussian vectors:

{N(0:Q _(t)):t=0, . . . ,n}.

Gaussian error processes may introduce the pre-processing at the encoderand/or decoder, in this example. Let,

{K _(t) :t=0, . . . ,n}, K _(t)

X _(t) −

{X _(t) |Y ^(t−1)},

denote the covariance of the pre-processing by:

Λ_(t)

{K _(t) K _(t) ^(tr) }, t=0, . . . ,n.

Also, let E_(t) be a unitary matrix such that:

E _(t)Λ_(t) E _(t) ^(tr)=diag{λ_(t,1), . . . λ_(t,p) }, Γ

E _(t) K _(t) , t=0, . . . ,n.

Analogously, to obtain the nonanticipative RDF in this example, theprocesses:

{{tilde over (k)} _(t) :t=0, . . . ,n}

defined by

{tilde over (K)} _(t)

Y _(t) −

{X _(t) |Y ^(t−1) }≡Y _(t) −{circumflex over (X)} _(t|t−1),{tilde over(Γ)}_(t) =E _(t) {tilde over (K)} _(t).

are introduced. Using properties of conditional entropy, observationsfor this example scenario include:

d _(0,n)(X ^(n) ,Y ^(n))=d _(0,n)(K ^(n) ,{tilde over (K)} ^(n))=Σ_(t=0)^(n) ∥{tilde over (K)} _(t) −K _(t)

_(p)=Σ_(t=0) ^(n)∥{tilde over (Γ)}_(t)−Γ_(t)

_(p).

and

R _(0,n) ^(na)(D)=R _(0,n) ^(na,K) ^(n) ^(,{tilde over (K)}) ^(n) (D)=R_(0,n) ^(na,Γ) ^(n) ^(,{tilde over (Γ)}) ^(n) (D).

Using these observation, a computing device and/or operator of acommunication system may obtain the optimal (e.g., capacity achieving)nonanticipative RDF for the above defined multidimensional Gaussianprocess. The computing device and/or operator may also identify a“duality” between a multidimensional Gaussian process and a MIMO AWGNchannel. These results are described below by way of example.

The R_(0,n) ^(na)(D) of the example Gaussian source, according to thedefinitions in section D.1 and the above discussed model of the Gaussiansource, is given by:

$\left. \mspace{20mu} {{{R_{0,n}^{na}(D)} = {\frac{1}{2}{\sum\limits_{t = 0}^{n}\; {\sum\limits_{i = 1}^{p}\; {\log \left( \frac{\lambda_{t,i}}{\delta_{t,i}} \right)}}}}}\Lambda_{t}} \right\},{{\hat{X}}_{t|{t - 1}}\left\{ X_{t} \middle| Y^{t - 1} \right\}}$$\mspace{20mu} {\delta_{t,i}\left\{ {\begin{matrix}\xi & {{{if}\mspace{14mu} \xi} \leq \lambda_{t,i}} \\\lambda_{t,i} & {{{if}\mspace{14mu} \xi} > \lambda_{t,i}}\end{matrix},{t = 0},\ldots \mspace{14mu},n,{i = 1},\ldots \mspace{14mu},{{p\mspace{20mu} {and}{\mspace{11mu} \;}\xi {\; \mspace{11mu}}{is}\mspace{14mu} {chosen}\mspace{14mu} {such}\mspace{14mu} {that}\mspace{14mu} \frac{1}{n + 1}{\sum\limits_{t = 0}^{n}\; {\sum\limits_{i = 1}^{p}\; \delta_{t,i}}}} = {D.}}} \right.}$

where the error

X _(t) −

{X _(t) |Y ^(t−1)}

is Gaussian, and where A: are given by the following Kalman filterequations:

  X̂_(t + 1|t) = AX̂_(t|t − 1) + A Λ_(t)(E_(t)^(tr)H_(t)E_(t))^(tr)M_(t)⁻¹(Y_(t) − X̂_(t|t − 1)), X̂_(0|−1) =  X₀$\mspace{20mu} {{\Lambda_{t} = {{A\; \Lambda_{t}A^{tr}} - {A\; {\Lambda_{t}\left( {E_{t}^{tr}H_{t}E_{t}} \right)}^{tr}{M_{t}^{- 1}\left( {E_{t}^{tr}H_{t}E_{t}} \right)}\Lambda_{t}A^{tr}} + {BB}_{t}^{tr}}},{\Lambda_{0} = {\overset{\_}{\Lambda}}_{0}}}$ M t = E t tr  H t  E t  Λ t  ( E t tr  H t  E t ) tr + E t tr t  Q t  t tr  E t$\mspace{20mu} {{H_{t}{diag}\left\{ {\eta_{t,1,\ldots \mspace{14mu},}\eta_{t,p}} \right\}},{\eta_{t,i} = {1 - \frac{\delta_{t,i}}{\lambda_{t,i}}}},{Q_{t}{{Cov}\left( V_{t}^{c} \right)}},{\sqrt{H_{t}\Delta_{t}Q_{t}^{- 1}}},{\Delta_{t} = {{diag}\left\{ {\delta_{t,1},\ldots \mspace{14mu},\delta_{t,p}} \right\}}},{i = 1},\ldots \mspace{14mu},p,{t = 0},{\ldots \mspace{14mu} {n.}}}$

Moreover,

σ{Y ^(t) }=σ{{tilde over (K)} ^(t) }=σ{B ^(t) }, t=0, . . . ,n

(i.e., these processes generate the same information).

FIG. 10A depicts a combination of encoder-channel-decoder in acommunication system 1000 that realizes R_(0,n) ^(na)(D). That is,information encoded by an encoder 1002, transmitted over one or morechannels 1004, and decoded by a decoder 1006 is distorted according toR_(0,n) ^(na)(D). A computing device may compress informationtransmitted over the channels 1004 at a certain rate (e.g., number ofbits/symbol), defined by R_(0,n) ^(na)(D), to limit distortion of theinformation below a threshold distortion, as further discussed withreference to FIG. 9. In this example, the realization of the optimalnonstationary reproduction distribution is given by:

Y _(t) =E _(t) ^(tr) H _(t) E _(t)(X _(t) −{circumflex over (X)}_(t|t−1))+E _(t) ^(tr)

_(t) V _(t) ^(c) +X _(t|t−1)

The encoder 1002 and decoder 1006 of the system 1000 may encode anddecode, respectively, with feedback according to:

A _(t) =E _(t) ^(tr) H _(t) E ^(t)(X _(t) −{circumflex over (X)}_(t|t−1))

Y _(t) =E _(t) ^(tr) H _(t) E _(t) {X _(t) −{circumflex over (X)}_(t|t−1) }+E _(t) ^(tr)

_(t) V _(t) ^(c) +X _(t|t−1)

Alternatively, the encoder 1002 and decoder 1006 of the system 1000 mayencode and decode, respectively, without feedback according to:

A _(t) =E _(t) ^(tr) H _(t) E _(t)(X _(t) −

(X _(t)))

Y _(t) =E _(t) ^(tr) H _(t) E _(t) {X _(t)−

(X _(t))}+E _(t) ^(tr)

_(t) V _(t) ^(c)+

(X _(t))

By taking a limit of R_(0,n) ^(na)(D), a computing device and/oroperator may obtain the per unit time nonanticipative RDF R^(na)(D) fromR_(0,n) ^(na)(D). This R^(na)(D) may represent the rate distortionfunction of stationary (e.g., not varying in time) Gaussian sources ofinformation, in this example. Specifically, the R^(na)(D) is obtained asfollows:

$\begin{matrix}{{R^{na}(D)} = {\frac{1}{2}{\lim\limits_{n\rightarrow\infty}{\frac{1}{n + 1}{\sum\limits_{t = 0}^{n}\; {\sum\limits_{i = 1}^{p}\; {\log \left( \frac{\lambda_{t,i}}{\delta_{t,1}} \right)}}}}}}} \\{= {\frac{1}{2}{\sum\limits_{i = 1}^{p}\; {{\log \left( \frac{\lambda_{\infty,i}}{\delta_{\infty,i}} \right)}.}}}}\end{matrix}$

wherelim_(n→∞)δ_(t,i)=δ_(∞,i) and lim_(n→∞)λ_(t,i)=λ_(∞,i).

In addition, for a scalar Gaussian stationary source:

X_(t + 1) = αX_(t) + σ_(W)W_(t), W_(t) ∼ N(0; 1) $\begin{matrix}{{{R^{na}(D)} = {\frac{1}{2}{\log \left( \frac{\lambda_{\infty,1}}{\delta_{\infty,1}} \right)}}},{\delta_{\infty,1} = D},{\lambda_{\infty,1} = {{\alpha^{2}D} + \sigma_{W}^{2}}},} \\{{= {\frac{1}{2}{\log \left( {\alpha^{2} + \frac{\sigma_{W}^{2}}{D}} \right)}}},{\lambda_{\infty,1} \geq \delta_{\infty,1.}}}\end{matrix}$

For an independent and identically distributed (IID) scalar Gaussiansource:

${{R^{na}(D)} = {{R(D)} = {\frac{1}{2}\log \frac{\sigma_{X}^{2}}{D}}}},{\sigma_{X} \geq D}$

Note, for a vector of independent sources with memory, thenonanticipative RDF involves water filing in the spatial dimension. Therealization of these sources over an Additive White Gaussian NoiseChannel with an average power constraint not exceeding P is shown inFIG. 10B.

Returning to FIG. 10A, the encoder 1002 may implement the compressionscheme for vector correlated sources, and the encoder 1002 may achievethe capacity of a communication system where the source is a Gaussianrandom process and where information is transmitted over additive whiteGaussian noise (AWGN) channels. This encoding may also be equivalent toJSCC design using symbol-by-symbol transmission of a vector Gaussiansource with memory over a multiple-input multiple output Gaussianmemoryless channel. In this capacity achieving case, the encoded symbolsmay be expressed with respect to the encoder 1002 which either includesfeedback (case (1)) or does not include feedback (case (2)). That is,the special cases (1), (2) below follow from the general nonanticipativeRDF of correlated Gaussian sources with a square error distortionfunction.

(1) Capacity Achieving Realization with Feedback.

Consider the realization of the

${R(D)} = {{R^{na}(D)} = {\frac{1}{2}\log \frac{\sigma_{X}^{2}}{D}}}$

(i.e., for IID processes classical RDF=nonantieipative RDF), Let X b aRVN(0;σ_(X) ²) with σ_(X)≧D. Letting p=1, then from (I.V.256)-(IV.257) wehave

${_{t} = q_{t}},{1\frac{1}{\delta_{t,1}}\left( {1 - \frac{\delta_{t,1}}{\lambda_{t,1}}} \right)},{\frac{P_{t,1}}{q_{t,1}} = {\frac{\lambda_{t,1}}{\delta_{t,1}} - 1}}$

which implies

${B_{t} = {{\sqrt{\frac{P_{t,1}}{\lambda_{t,1}}}K_{t}} + V_{t}^{c}}},{t = 0},\ldots \mspace{20mu},{n.}$

${_{t} = \sqrt{\frac{P_{t,1}}{\lambda_{t,1}}}},{\lambda_{t,1} = {{Var}\left( {X - {\left( X \middle| B^{i - 1} \right)}} \right)}},{K_{t} = {X - {\left( X \middle| B^{i - 1} \right)}}},$

Substituting into the encoder the limiting values, δ_(∞,1)=D,lim_(n→∞)q_(t,1)=q_(∞,1), lim_(n→∞)P_(t,1)=P_(∞,1)=then for i=0, 1, . ..

${B_{t} = {{\sqrt{\frac{P}{\left\{ {X - {\left( X \middle| B^{i - 1} \right)}} \right\}^{2}}}\left\{ {X - {\left( X \middle| B^{i - 1} \right)}} \right\}} + V_{t}^{c}}},{t = 0},1,\ldots \mspace{14mu},$

(2) Capacity Achieving Realization without Feedback.

When there is no feedback, all state-ments in (1) hold, λ_(∞,1)=σ_(X) ²,while

(X|B^(i−1)) is replaced by

(X_(i)|σ{

})=

(X) (i.e, only á priori information is used), and then (IV.264) reducesto

${B_{t} = {{\sqrt{\frac{P}{\left\{ {X_{t} - {\left( X_{t} \right)}} \right\}^{2}}}\left\{ {X_{t} - {\left( X_{t} \right)}} \right\}} + V_{t}^{c}}},{t = 0},1,\ldots \mspace{14mu},$

E. Joint Source Channel Coding Design

The following description may refer to “Joint Source Channel Coding,” orJSCC, as a coding/decoding scheme or a design of a coding/decodingscheme that does not separate source encoder and decoders and channelencoders and decoders. Such JSCC may produce a coding scheme that bothcompresses and encodes data for transmission, where the compressionoccurs with zero-delay and the compression and encoding allow thetransmission to achieve a capacity of a channel.

To further clarify this point and by way of example, FIG. 11 depictsanother example communication system 1100. A JSCC design of an encoder1102 may not separate the source coding and channel coding of theencoder 1102. That is, the encoder 1102 may include a single code toperform both source coding (e.g., compression) and channel coding (e.g.,encoding of the compressed symbols). Similarly, a JSCC design of adecoder 1104 may not separate the source decoding and channel decodingof the decoder 1104.

E.1. Methods for Joint Source Channel Coding Design

FIG. 12 is a flow diagram of an example method 1200 for designing andutilizing an {encoder, decoder} pair based on JSCC. A computing device,such as the computing device illustrated in detail in FIG. 15, may bespecially and/or specifically configured to implement at least a portionof the method 1200. Further, in some implementations, a suitablecombination of a computing device, an operator of a communicationsystem, such as the communication system 100, and one or more othersuitable components of a communication system (e.g., routers, modems,gateways, etc.) may implement the method 1200.

The example method 900 includes determining a rate distortion function(RDF) of the source (block 1202). The source, for example, may be thesource 102, and a computing device and/or operator may determine the RDFto be a nonanticipative RDF corresponding to the source 102, as furtherdiscussed in section D entitled “Encoder Design Overview and Methods.”In this manner, the computing device and/or operator may determine anRDF that is zero-delay. In some implementations, the determined RDF mayinclude one or more realizations of an encoder, channel, and decoderrepresenting the RDF, and, in other applications, the determined RDF mayinclude one or more sets of data, algorithms, or instructions thatrepresent the RDF and are stored a non-transitory computer-readablemedium.

The example method 900 also includes determining a capacity of a channelover which information, generated by the source, is to be transmitted(block 1204). The determined capacity may include a characterization orformula for a finite block length feedback capacity, feedback capacity,finite block length feedback capacity with transmission cost, feedbackcapacity with transmission cost, finite block length capacity withoutfeedback, capacity without feedback, finite block length capacitywithout feedback and with transmission cost, and capacity withoutfeedback and with transmission cost. A computing device and/or operatormay determine this characterization or formula according to the method300, for example.

An {encoder, decoder} pair is then identified (block 1206), where the{encoder, decoder} pair realizes the determined RDF and achieves thedetermined capacity. The {encoder, decoder} pair may include a singlecode for the encoder, which compresses and encodes symbols from asource, such as the source 102. The {encoder, decoder} pair may alsoinclude a single decoder, which code both decodes and decompressessymbols transmitted over the channel. In other words, the encoder mayreceive uncompressed symbols from a source and output compressed andencoded symbols for transmission over a channel, and the decoder mayreceive the transmitted symbols and output uncompressed and decodedsymbols.

Identifying the {encoder, decoder} pair that realizes the determined RDFand achieves the determined capacity may include identifying an{encoder, decoder} pair that satisfies one or more conditions, in animplementation. As discussed further below, a computing device and/oroperator may utilize the determined RDF and the determined capacity togenerate specific conditions that the identified {encoder, decoder} pairmust satisfy. Then the computing device and/or operator may identify aspecific {encoder, decoder} pair that realizes the determined RDF andachieves the determined capacity.

Returning to FIG. 12, the example method 1200 may also includetransmitting information according to the identified {encoder, decoder}pair (block 1208). A computing device and/or operator may configures oneor more components of a communication system, such as computing devices,modems, routers, etc., to utilize the identified codes for encoding anddecoding in transmitting information. This specific configuration mayinclude specifically programming specialized modules of a computingdevice or other tangible communications device with the identified codesfor encoding and decoding. A computing device and/or operator may thenactivate transmissions of information according to the identified{encoder, decoder} pair. These transmissions may be optimal or nearoptimal in the sense that R(D)=C(K) for a range of values (D, K).

E.2. Methodology of JSCC Design for General {Source, Channel} Pairs withMemory

In this section, example methodologies for JSCC design for general{source, channel} pairs with memory and with respect to {distortionfunction, transmission cost} pairs are developed. A computing deviceand/or operator may implement these methodologies as part of the examplemethod 900, for example, or in another method in which encoders and/ordecoders are designed according to JSCC. The methodologies are for JSCCdesign with respect to general {distortion function, transmission cost}pairs, where some examples of distortion functions and transmission costfunctions are further discussed in sections B and D.

E.2.1. Nonanticipative (Zero-Delay) Code

To facilitate the development of JSCC methodologies, an exampledefinition of nonanticipative code is introduced below. The definitionis for any

{source, channel}≡{P _(X) _(n) ,{P _(B) _(i) _(|B) _(i−1) _(,A) _(i):i=0,1, . . . ,n}}

with respect to any{distortion function, transmission cost}≡{d_(0,n),c_(0,n)}

A set of randomized nonanticipative encoders with feedback denoted by

_([0,n]) ^(fb,E)

may be a sequence of conditional distributions:

P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) _(,X) _(i) (a _(i) |a ^(i−1) ,b^(i−1) ,x ^(i)), i=0,1, . . . ,n

Also, an example set of randomized feedback encoders embedsnonanticipative deterministic feedback encoders defined by:

[ 0 , n ] fb   { e i  :   i - 1 ×  i - 1 ×  i ↦  i , a i = e i ( a i - 1 , b i - 1 , x i ) , i = 0 , …  , n } ⋐ [ 0 , n ] fb , E .

The example encoders introduced above are nonanticipative in the sensethat at each transmission time, i,

P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) _(,X) _(i) (a _(i) |a ^(i−1) ,b^(i−1) ,x ^(i)) and e _(i)(a ^(i−1) ,b ^(i−1) ,x ^(i))

do not depend on any future symbols (e.g., symbols to be transmitted atfuture times). The encoders may only be functions of past and presentsymbols and past channel inputs and outputs, in an implementation.

A set of randomized and deterministic nonanticipative encoders withoutfeedback may be defined by:

[ 0 , n ] nfb , E   { P A i  A i - 1 , x i  :   i = 0 , 1 , …  ,n } ⋐ [ 0 , n ] fb , E ,  [ 0 , n ] nfb   { e i  ( · , · )  :   ai = e i nfb  ( a i - 1 , x j ) } i = 0 n ⋐ [ 0 , n ] fb .

Also, a randomized decoder may be a sequence of conditionaldistributions defined by

[ 0 , n ] D   { P Y i  Y i - 1 , B i  ( y i  y i - 1 , b i )  :  i = 0 , 1 , …  , n }

embedding deterministic decoders denoted by:

[ 0 , n ] D   { g i  ( · , · )  :   y i = g i  ( y i - 1 , b i ) } i = 0 n  ⋐ [ 0 , n ] D .

Given any source, nonanticipative encoder as defined above, andrandomized decoder as defined above, a joint probability distributionmay be defined as:

P _(X) _(n) _(,A) _(n) _(,B) _(n) _(,Y) _(n) (dx ^(n) ,da ^(n) ,da ^(n),db ^(n) ,dy ^(n))=

_(i=0) ^(n) P _(Y) _(i) _(|Y) _(i−1) _(,B) _(i) (dy _(i) |y ^(i−1) ,b^(i))

P _(B) _(i) _(|B) _(i−1) _(,A) _(i) (db _(i) |b ^(i−1) ,a ^(i))

P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) _(,X) _(i) (da _(i) |a ^(i−1) ,b^(i−1) ,x ^(i))

P _(X) _(i) _(|X) _(i−1) (dx _(i) |x ^(i−1)).

Given any

{P _(X) _(n) ,{P _(B) _(i) _(|B) _(i−1) _(,A) _(i) :i=0,1, . . . ,n}},

a nonanticipative code for JSCC design of a system,

Σ_(JSCC)(^(n), ^(n), ^(n), ^(n), P_(X^(n)), {P_(B_(i)B^(i − 1), A^(i)):  i = 0, 1, …  , n}, d_(0, n), c_(0, n)),

is a nonanticipative {encoder, decoder} pair

{P _(A) _(i) _(|A) _(i−1) _(,B) _(i−1) _(,X) _(i) ,P _(Y) _(i) _(|Y)_(i−1) _(,B) _(i) :i=0,1, . . . ,n}

This nonanticipative {encoder, decoder} pair has an excess distortionprobability:

{d _(0,n)(x ^(n) ,y ^(n))>(n+1)d}≦ε,εε(0,1),d≧0

and transmission cost:

${{\frac{1}{n + 1}\left\{ \; {c_{0,n}\left( {A^{n},B^{n - 1}} \right)} \right\}} \leq \kappa},{\kappa \geq 0.}$

The minimum excess distortion achievable by a nonanticipative code maybe: D^(o)(n,ε,

)

inf{d:∀(n,d,ε,

) nonanticipative code}.

E.2.2. Realization of a Reproduction Distribution of the Finite-Time RDF

The following description defines an example “realization” of theconditional distribution corresponding to the R_(0,n) ^(na)(D) for agiven source. FIG. 13 illustrates such a realization. As describedfurther with reference to FIG. 12, this type of realization may beutilized for JSCC design of an {encoder, decoder} pair.

Given a nonanticipative code:

Σ_(JSCC)(^(n), ^(n), ^(n), ^(n), P_(X^(n)), {P_(B_(i)B^(i − 1), A^(i)):  i = 0, 1, …  , n}, d_(0, n), c_(0, n)),

a realization of the optimal reproduction distribution:

{P _(Y) _(i) _(|Y) _(i−1) _(,X) _(i) *(y _(i) |y ^(i−1) ,x ^(i)):i=0,1,. . . ,n}

corresponding to R_(0,n) ^(na)(D) may be an {encoder, decoder} pair withthe encoder used with or without feedback, such that

{right arrow over (P)} _(Y) _(n) _(|X) _(n) *(y ^(n) |x ^(n))=

_(i=0) ^(n) P _(Y) _(i|Y) _(i−1) _(,X) ^(i)(dy _(i) |y ^(i−1) ,x ^(i))

In such a case, the R_(0,n) ^(na)(D) is “realizable,” because therealization operates with average distortion≦D.

In some implementations, given a system:

Σ_(JSCC)(^(n), ^(n), ^(n), ^(n), P_(X^(n)), {P_(B_(i)B^(i − 1), A^(i)):  i = 0, 1, …  , n}, d_(0, n), c_(0, n)),

multiple realization of the optimal reproduction distribution may exist.For example, in systems utilizing Gaussian sources and channels withmemory, realization with feedback encoding and without feedback encodingmay exist.

To identify the {encoder, decoder} pair that realizes the determined RDFand achieves the determined capacity, as discussed with reference toFIG. 12, a computing device or operator may identify the realizationthat satisfies

R _(0,n) ^(na)(D)=C _(A) _(n) _(→B) _(n) ^(FB)(

).

Then, for such a realization, a computing device and/or operator maycompute the excess distortion probability is computed and the minimumexcess distortion achievable. Because the RDFs and capacities may bedetermined (e.g., according to the example methods 300 and 600), the{encoder, decoder} pair follows from the above condition.E.3. Example JSCC Design for a Binary Symmetric Markov Source Over aBinary State Symmetric Channel with Memory

Although the following description provides an example design of an{encoder, decoder} pair according to JSCC for a specific type of sourceand channel, the techniques of the present disclosure may allow{encoder, decoder} pairs to be designed for other types of channelsand/or sources. For example, the techniques disclosed herein may beutilized to generate {encoder, decoder} pairs for binary, symmetric,non-symmetric, Gaussian, non-Gaussian, stationary, non-stationary, etc.sources and/or channels. In fact, the realization of FIG. 10Acorresponds to a JSCC design.

In the following example, an {encoder, decoder} pair, designed accordingto JSCC, realizes an RDF for a channel and operates at the capacity ofthe channel. Specifically, the example includes a binary symmetricMarkov source (BSMS(p)) and a binary state symmetric channel(BSSC(α_(i),β_(i))). The example also utilizes a single letter Hammingdefinition of distortion and a single letter instantaneous cost functionwith and without feedback.

The nonanticipative RDF of the BSMS(p) may be developed from thefollowing transition probabilities:

${P_{X_{i}X_{i - 1}}\left( {x_{i}x_{i - 1}} \right)} = {\begin{bmatrix}{1 - p} & p \\p & {1 - p}\end{bmatrix}.}$

Also, the single letter Hamming distortion criterion between the sourcesymbol and the reproduction (e.g., decoded) symbol may be defined by:

ρ(x _(i) ,y _(i))=0 if x _(i) =y _(i),

and 0 otherwise.

According to the definitions of nonanticipative RDF presented in sectionD, R^(na)(D) of a BSMS(p) with single letter Hamming distortion is givenby:

${R^{na}(D)} = \left\{ \begin{matrix}{{H(p)} - {{mH}(\alpha)} - {\left( {1 - m} \right){H(\beta)}}} & {{{if}\mspace{14mu} D} \leq \frac{1}{2}} \\0 & {otherwise}\end{matrix} \right.$

where D is the average distortion,

${m = {1 - p - D + {2{pD}}}},{\alpha = \frac{\left( {1 - p} \right)\left( {1 - D} \right)}{1 - p - D + {2{pD}}}},{\beta = {\frac{p\left( {1 - D} \right)}{p + D - {2{pD}}}.}}$

The optimal reproduction distribution may be given by:

${P_{{Y_{i}X_{i}},Y_{i - 1}}^{*}\left( {{y_{i}x_{i}},y_{i - 1}} \right)} = {\overset{\begin{matrix}{0,0} & {0,1} & {1,0} & {1,1}\end{matrix}}{\begin{matrix}0 \\1\end{matrix}\begin{bmatrix}\alpha & \beta & {1 - \beta} & {1 - \alpha} \\{1 - \alpha} & {1 - \beta} & \beta & \alpha\end{bmatrix}}.}$

To determine the capacity of BSSC(α_(i),β₁) with and without feedback, acomputing device and or operator may, in this example, consider aspecial case of the unit memory channel of the same structure as theoptimal reproduction distribution:

${P_{{B_{i}A_{i}},B_{i - 1}}\left( {{b_{i}a_{i}},b_{i - 1}} \right)} = {\begin{matrix}0 \\1\end{matrix}{\overset{\begin{matrix}{0,0} & {0,1} & {1,0} & {1,1}\end{matrix}}{\begin{bmatrix}\alpha_{1} & \beta_{1} & {1 - \beta_{1}} & {1 - \alpha_{1}} \\{1 - \alpha_{1}} & {1 - \beta_{1}} & \beta_{1} & \alpha_{1}\end{bmatrix}}.}}$

Also, the state of a channel may be:

s _(i)

a _(i) ⊕b _(i−1).

The single letter cost function may be:

${\gamma \left( {a_{i},b_{i - 1}} \right)} = \left\{ {\begin{matrix}1 & {{{{if}\mspace{14mu} a_{i}} = b_{i - 1}},\left( {s_{i} = 0} \right)} \\0 & {{{{if}\mspace{14mu} a_{i}} \neq b_{i - 1}},\left( {s_{i} = 1} \right)}\end{matrix}.} \right.$

Also, an average transmission cost is imposed, where the averagetransmission cost is defined by

${{\frac{1}{n + 1}{\sum\limits_{i = 0}^{n}\; {\left\{ {\gamma \left( {a_{i},b_{i - 1}} \right)} \right\}}}} = \kappa},$

wire

=constant.In this example, the average transmission cost at time i may be:

{γ(A _(i) ,B _(i−1))}=P _(A) _(i) _(B) _(i−1) (0,0)+P _(A) _(i) _(,B)_(i−1) (1,1)=P _(S) _(i) (0).

According to the techniques discussed in section B, the capacity of theBSSC(α₁,β_(i)) with and without feedback and average transmission costis equal and may be expressed as:

C _(A) _(∞) _(→B) _(∞) ^(FB)(

)=C _(A) _(∞) _(;B) _(∞) ^(nfB)(

)=H(λ)(1−

))−−

H(α₁)−(1−

)H(β₁)

where

λ=α₁

+(1=β₁).

The capacity achieving channel input distribution without feedback is:

${P_{A_{i}|A_{i - 1}}^{*}\left( a_{i} \middle| a_{i - 1} \right)} = \begin{bmatrix}\frac{1 - \kappa - \gamma}{1 - {2\gamma}} & \frac{\kappa - \gamma}{1 - {2\gamma}} \\\frac{\kappa - \gamma}{1 - {2\gamma}} & \frac{1 - \kappa - \gamma}{1 - {2\gamma}}\end{bmatrix}$

where

γ=α₁

+β₁(1−

).

and the capacity achieving channel input distribution with feedback is:

${P_{A_{i}|B_{i - 1}}^{*}\left( a_{i} \middle| b_{i - 1} \right)} = {\begin{bmatrix}\kappa & {1 - \kappa} \\{1 - \kappa} & \kappa\end{bmatrix}.}$

Now that the capacity and RDF are determined for this example, acomputing device and/or operator may identify or construct an {encoder,decoder} pair that realizes the determined RDF and achieves thedetermined capacity. FIG. 14 illustrates example realizations of the{encoder, decoder} pair.

For

=m, α ₁=α and β₁=β

the {encoder, decoder} pair to be identified or constructed mustsatisfy:

C _(A) _(∞) _(→B) _(∞) ^(FB)(

)=H(p)−mH(α)−(1−m)H(β)=R ^(na)(D)

For encoding without feedback (as shown in part (a) of FIG. 14), aparameter of the optimal channel input distribution may be equal to theparameter of the conditional distribution of the source:

$\left( {\frac{1 - \kappa - \gamma}{1 - {2\gamma}} = p} \right).$

In this example, the {encoder, decoder} pair satisfying the abovecondition may be the identity mapping on respective inputs. That is,

a _(i) =x _(i) ,y _(i) =b _(i)

or uncoded transmission is optimal. This result may imply that noencoding is performed and no decoding is performed. Also, a computingdevice and/or operator may evaluate the minimum excess distortionachievable. For encoding with feedback (shown in part (b) of FIG. 14),the {encoder, decoder} pair is given by:

a _(i) =x _(i) ⊕b _(i−1) , y _(i) =b _(i) , i=0,1, . . .

Although this case is presented above by way of example, other {encoder,decoder} pairs can be computed using precisely the same methodology(e.g., by invoking the examples of optimal channel input distributionsfor channels with memory and feedback and transmission cost, and byusing the expression of the finite time nonanticipative RDF).

F. Example Computing Device

FIG. 15 illustrates an example computing device 1650, which computingdevice 1650 may be implemented as the source 102, may implement theencoder 104 or the decoder 108, and/or may implement at least some ofthe functionality discussed with reference to FIGS. 3, 4, 5, 8, 9, and12. The computing device 1650 may include one or more central processingunits (CPUs) or processing units 1651 (may be called a microcontrolleror a microprocessor), a system memory 1652 a and 1652 b, and a systembus 1654 that couples various system components including the systemmemory 1652 to the processing units 1651. The system bus 1654 mayinclude an Industry Standard Architecture (ISA) bus, a Micro ChannelArchitecture (MCA) bus, Enhanced ISA (EISA) bus, Video ElectronicsStandards Association (VESA) local bus, a Peripheral ComponentInterconnect (PCI) bus or a Mezzanine bus, and the Peripheral ComponentInterconnect Express (PCI-E) bus.

The computing device 1650 may include an assortment of computer-readablemedia. Computer-readable media may be any media that may be accessed bythe computing device 1650. By way of example, and not limitation, thecomputer-readable media may include both volatile and nonvolatile media,removable and non-removable media. Media may also include computerstorage media and communication media. The computer-readable media maystore information such as computer-readable instructions, programmodules, data structures, or other data. Computer-storage media mayinclude non-transitory media, such as a RAM 1652 b, a ROM 1652 a,EEPROM, optical storage disks, magnetic storage devices, and any othernon-transitory medium which may be used to store computer-accessibleinformation.

In an embodiment, the ROM 1652 a and/or the RAM 1652 b may storeinstructions that are executable by the processing unit 1651. Forexample, a basic input/output system (BIOS), containing algorithms totransfer information between components within the computer 1650, may bestored in ROM 1652 b. Data or program modules that are immediatelyaccessible or are presently in use by the processing unit 1651 may bestored in RAM 1652 a. Data normally stored in RAM 1652 a while thecomputing device 1650 is in operation may include an operating system,application programs, program modules, and program data. In particular,the RAM 1652 a may store one or more applications 1660 including one ormore routines 1662, 1664, and 1666 implementing the functionality of theexample methods 300, 400, 500, 800, 900, and 1200.

The computing device 1650 may also include other storage media such as ahard disk drive that may read from or write to non-removable,non-volatile magnetic media, a magnetic disk drive that reads from orwrites to a removable, non-volatile magnetic disk, and an optical diskdrive that reads from or writes to a removable, nonvolatile opticaldisk. Other storage media that may be used includes magnetic tapecassettes, flash memory cards, digital versatile disks, digital videotape, solid state RAM, and solid state ROM. The hard disk drive may beconnected to the system bus 1654 through a non-removable memoryinterface such as interface 1674. A magnetic disk drive and optical diskdrive may be connected to the system bus 1654 by a removable memoryinterface, such as interface 1690.

A user or operator may interact with the computing device 1650 throughinput devices such as a keyboard or a pointing device (i.e., a mouse). Auser input interface 1702 may be coupled to the system bus 1654 to allowthe input devices to communicate with the processing unit 1651. Adisplay device such as a monitor 1722 may also be connected to thesystem bus 1654 via a video interface (not shown).

The computing device 1650 may operate in a networked environment usinglogical connections to one or more remote computing devices, forexample. The remote computing device may be a personal computer (PC), aserver, a router, or other common network node. The remote computingdevice typically includes many or all of the previously-describedelements regarding the computing device 1650. Logical connectionsbetween the computing device 1650 and one or more remote computingdevices may include a wide area network (WAN). A typical WAN is theInternet. When used in a WAN, the computing device 1650 may include amodem or other means for establishing communications over the WAN. Themodem may be connected to the system bus 1654 via the network interface1725, or other mechanism. In a networked environment, program modulesdepicted relative to the computing device 1650, may be stored in theremote memory storage device. As may be appreciated, other means ofestablishing a communications link between the computing device 1650 anda remote computing device may be used.

Additional Considerations

Upon reading this disclosure, those of ordinary skill in the art willappreciate still additional alternative structural and functionaldesigns for characterizing channels and capacities, determining optimalinput distributions, designing encoders and decoders, etc. Thus, whileparticular embodiments and applications have been illustrated anddescribed, it is to be understood that the disclosed embodiments are notlimited to the precise construction and components disclosed herein.Various modifications, changes and variations, which will be apparent tothose skilled in the art, may be made in the arrangement, operation anddetails of the method and apparatus disclosed herein without departingfrom the spirit and scope defined in the appended claims.

The particular features, structures, or characteristics of any specificembodiment may be combined in any suitable manner and in any suitablecombination with one or more other embodiments, including the use ofselected features without corresponding use of other features. Inaddition, many modifications may be made to adapt a particularapplication, situation or material to the essential scope and spirit ofthe present invention. It is to be understood that other variations andmodifications of the embodiments of the present invention described andillustrated herein are possible in light of the teachings herein and areto be considered part of the spirit and scope of the present invention.By way of example, and not limitation, the present disclosurecontemplates at least the following aspects:

1. A method for characterizing a capacity of a channel with memory andfeedback, the method comprising:

defining a channel model corresponding to the channel, wherein:the channel is utilized to transmit information from a source to adestination, andthe channel model indicates a dependence of outputs from the channel onpast and present channel input symbols and on past channel outputsymbols;determining a representation of the capacity based on the channel modeland based on a channel input distribution that achieves the capacity,wherein the representation represents the capacity for a finite numberof transmissions over the channel, and wherein the representationincludes an optimization; andsolving, by one or more processors of a specially configured computingdevice, the optimization of the representation of the capacity todetermine the capacity of the channel for the finite number oftransmissions.

2. The method of aspect 1, wherein determining the representation of thecapacity includes:

determining, using stochastic optimal control techniques, a subset ofdistributions which include the channel input distribution that achievesthe capacity.

3. The method of aspect 2, wherein the subset of distributions is afirst subset of distributions, and wherein determining therepresentation of the capacity further includes:

determining, based on the first subset of distributions and using avariational equality of conditional mutual information or mutualinformation, an upper bound to identify a second subset of distributionswhich includes the channel input distribution that achieves thecapacity, wherein the second subset of distributions is smaller than thefirst subset of distributions.

4. The method of any one of aspects 1 to 3, further comprising:

defining a transmission cost function, wherein the transmission costfunction specifies a cost to transmit the information from the source tothe destination and indicates a dependence on at least one of the pastand present channel input symbols or the past channel output symbols,wherein determining the representation of the capacity includesdetermining the representation of the capacity based on the channelmodel and based on the transmission cost function.

5. The method of aspect 4, wherein the subset of distributions is afirst subset of distributions, wherein determining the representation ofthe capacity further includes:

determining, using stochastic optimal control techniques, a subset ofdistributions which include the channel input distribution that achievesthe capacity,determining, based on the channel model and the transmission costfunction, if an output of the channel or the transmission cost functionis dependent on quantities other than the past channel input symbols andthe past channel output symbols,if the output of the channel or the transmission cost function isdependent on quantities other than the past channel input symbols andthe past channel output symbols,determining, based on the first subset of distributions and using avariational equality of conditional mutual information, a second subsetof distributions which include the optimal channel input distribution,wherein the second subset of distributions is smaller than the firstsubset of distributions, anddetermining the representation of the capacity based on the secondsubset of distributions, if the output of the channel or thetransmission cost function is not dependent on quantities other than thepast channel input symbols and the past channel output symbols,determining the representation of the capacity based on the first subsetof distributions

6. The method of any one of aspects 1 to 5, wherein the representationof the capacity is a first representation of a finite-block lengthcapacity, the method further comprising:

determining, based on the first representation of the finite-blocklength capacity, determining a second representation of the capacity,wherein the second representation is an upper bound on the firstrepresentation and represents the capacity for an infinite number oftransmissions over the channel per unit time.

7. The method of any one of aspects 1 to 6, wherein the optimization isa maximization, and wherein solving the optimization of therepresentation of the capacity including solving the maximization.

8. The method of any one of aspects 1 to 7, wherein solving theoptimization includes solving the optimization using a dynamicprogramming algorithm.

9. The method of any one of aspects 1 to 8, wherein solving theoptimization includes solving the optimization using a Blahut-Arimotoalgorithm sequentially.

10. The method of any one of aspects 1 to 9, wherein solving theoptimization includes:

determining a gain or reduction in the capacity for encoding withfeedback as compared to encoding without feedback.

11. The method of any one of aspects 1 to 10, further comprising:

designing, by an encoder design procedure, a coding scheme based on thecapacity of the channel determined by solving the optimization of therepresentation of the capacity, wherein the coding scheme utilizes thechannel input distribution that achieves the capacity, and wherein thecoding scheme satisfies a condition specifying that none of theinformation is lost in the coding scheme.

12. The method of aspect 11, further comprising:

configuring an encoder coupled to the channel to encode based on thecoding scheme that satisfies the condition specifying that none of theinformation is lost in the coding scheme.

13. A system including:

one or more processors; andone or more non-transitory memories,wherein the one or more non-transitory memories store computer-readableinstructions that specifically configure the system such that, whenexecuted by the one or more processors, the computer-readableinstructions cause the system to:receive a channel model corresponding to the channel, wherein:the channel is utilized to transmit information from a source to adestination, andthe channel model indicates a dependence of outputs from the channel onpast and present channel input symbols and on past channel outputsymbols;determine a representation of the capacity based on the channel modeland based on a channel input distribution that achieves the capacity,wherein the representation represents the capacity for a finite numberof transmissions over the channel, and wherein the representationincludes an optimization; andsolve the optimization of the representation of the capacity todetermine the capacity of the channel for the finite number oftransmissions.

14. The system of aspect 13, wherein the computer-readable instructionsfurther cause the system to:

determine a nonanticipative rate distortion function based on a receivedmodel of the source and based on a causal reproduction distribution,wherein the second representation and the third representation specify arate at which symbols from the source should be transmitted to thedestination.

15. The system of aspect 14, wherein the computer-readable instructionsfurther cause the system to:

design a coding scheme based on the capacity of the channel determinedby solving the optimization of the representation of the capacity andbased on the nonanticipative rate distortion function, wherein thecoding scheme utilizes the channel input distribution that achieves thecapacity, and wherein the coding scheme satisfies a condition specifyingthat none of the information is lost in the coding scheme,configure an encoder coupled to the channel to encode based on thecoding scheme, wherein the coding scheme simultaneously compresses andencodes information transmitted over the channel.

16. The system of aspect 15, wherein designing the coding schemeincludes designing the coding scheme by joint source channel coding.

17. The system of any one of aspects 13 to 16, wherein determining therepresentation of the capacity includes:

determining, using stochastic optimal control techniques, a subset ofdistributions which include the channel input distribution that achievesthe capacity.

18. The system of aspect 17, wherein the subset of distributions is afirst subset of distributions, and wherein determining therepresentation of the capacity further includes:

determining, based on the first subset of distributions and using avariational equality of conditional mutual information or mutualinformation, an upper bound to identify a second subset of distributionswhich includes the channel input distribution that achieves thecapacity, wherein the second subset of distributions is smaller than thefirst subset of distributions.

19. The system of any one of aspects 13 to 18, wherein therepresentation of the capacity is a first representation of afinite-block length capacity, and wherein the computer-readableinstructions further cause the system to:

determine, based on the first representation of the finite-block lengthcapacity, determining a second representation of the capacity, whereinthe second representation is an upper bound on the first representationand represents the capacity for an infinite number of transmissions overthe channel per unit time.

20. The system of any one of aspects 13 to 19, wherein the channel modelis a probabilistic map.

21. The system of any one of aspects 13 to 20, wherein the channel modelis a function of the past and present channel input symbols.

1. A method for characterizing a capacity of a channel with memory andfeedback, the method comprising: defining a channel model correspondingto the channel, wherein: the channel is utilized to transmit informationfrom a source to a destination, and the channel model indicates adependence of outputs from the channel on past and present channel inputsymbols and on past channel output symbols; determining a representationof the capacity based on the channel model and based on a channel inputdistribution that achieves the capacity, wherein the representationrepresents the capacity for a finite number of transmissions over thechannel, and wherein the representation includes an optimization; andsolving, by one or more processors of a specially configured computingdevice, the optimization of the representation of the capacity todetermine the capacity of the channel for the finite number oftransmissions and a per unit time limit of the capacity of the channel.2. The method of claim 1, wherein determining the representation of thecapacity includes: determining, using stochastic optimal controltechniques, a subset of distributions which include the channel inputdistribution that achieves the capacity.
 3. The method of claim 2,wherein the subset of distributions is a first subset of distributions,and wherein determining the representation of the capacity furtherincludes: determining, based on the first subset of distributions andusing a variational equality of conditional mutual information or mutualinformation, an upper bound to identify a second subset of distributionswhich includes the channel input distribution that achieves thecapacity, wherein the second subset of distributions is smaller than thefirst subset of distributions.
 4. The method of claim 1, furthercomprising: defining a transmission cost function, wherein thetransmission cost function specifies a cost to transmit the informationfrom the source to the destination and indicates a dependence on thepast and present channel input symbols and the past channel outputsymbols, wherein determining the representation of the capacity includesdetermining the representation of the capacity based on the channelmodel and based on the transmission cost function.
 5. The method ofclaim 4, wherein the subset of distributions is a first subset ofdistributions, wherein determining the representation of the capacityfurther includes: determining, using stochastic optimal controltechniques, a subset of distributions which include the channel inputdistribution that achieves the capacity, determining, based on thechannel model and the transmission cost function, if an output of thechannel or the transmission cost function is dependent on quantitiesother than the past channel input symbols and the past channel outputsymbols, if the output of the channel or the transmission cost functionis dependent on quantities other than the past channel input symbols andthe past channel output symbols, determining, based on the first subsetof distributions and using a variational equality of conditional mutualinformation or mutual information or mutual information, a second subsetof distributions which include the optimal channel input distribution,wherein the second subset of distributions is smaller than the firstsubset of distributions, and determining the representation of thecapacity based on the second subset of distributions, if the output ofthe channel or the transmission cost function is not dependent onquantities other than the past channel input symbols and the pastchannel output symbols, determining the representation of the capacitybased on the first subset of distributions
 6. The method of claim 1,wherein the representation of the capacity is a first representation ofa finite-block length capacity, the method further comprising:determining, based on the first representation of the finite-blocklength capacity, determining a second representation of the capacity,wherein the second representation is an upper bound on the firstrepresentation and represents the capacity for an infinite number oftransmissions over the channel per unit time.
 7. The method of claim 1,wherein the optimization is a maximization, and wherein solving theoptimization of the representation of the capacity including solving themaximization.
 8. The method of claim 1, wherein solving the optimizationincludes solving the optimization using a dynamic programming algorithm.9. The method of claim 1, wherein solving the optimization includes oneof solving the optimization independently using a Blahut-Arimotoalgorithm sequentially or solving the optimization using theBlahut-Arimoto algorithm in combination with a dynamic programmingalgorithm.
 10. The method of claim 1, wherein solving the optimizationincludes: determining a gain in the capacity for encoding with feedbackas compared to encoding without feedback.
 11. The method of claim 1,further comprising: designing, by an encoder design procedure, a codingscheme based on the capacity of the channel determined by solving theoptimization of the representation of the capacity, wherein the codingscheme utilizes the channel input distribution that achieves thecapacity, and wherein the coding scheme satisfies a condition specifyingthat none of the information is lost in the coding scheme.
 12. Themethod of claim 11, further comprising: configuring an encoder coupledto the channel to encode based on the coding scheme that satisfies thecondition specifying that none of the information is lost in the codingscheme.
 13. A system comprising: one or more processors; and one or morenon-transitory memories, wherein the one or more non-transitory memoriesstore computer-readable instructions that specifically configure thesystem such that, when executed by the one or more processors, thecomputer-readable instructions cause the system to: receive a channelmodel corresponding to the channel, wherein: the channel is utilized totransmit information from a source to a destination, and the channelmodel indicates a dependence of outputs from the channel on past andpresent channel input symbols and on past channel output symbols;determine a representation of the capacity based on the channel modeland based on a channel input distribution that achieves the capacity,wherein the representation represents the capacity for a finite numberof transmissions over the channel, and wherein the representationincludes an optimization; and solve the optimization of therepresentation of the capacity to determine the capacity of the channelfor the finite number of transmissions.
 14. The system of claim 13,wherein the computer-readable instructions further cause the system to:determine a nonanticipative rate distortion function based on a receivedmodel of the source and based on a causal reproduction distribution,wherein the second representation and the third representation specify arate at which symbols from the source should be transmitted to thedestination.
 15. The system of claim 14, wherein the computer-readableinstructions further cause the system to: design a coding scheme basedon the capacity of the channel determined by solving the optimization ofthe representation of the capacity and based on the nonanticipative ratedistortion function, wherein the coding scheme utilizes the channelinput distribution that achieves the capacity, and wherein the codingscheme satisfies a condition specifying that none of the information islost in the coding scheme, configure an encoder coupled to the channelto encode based on the coding scheme, wherein the coding schemesimultaneously compresses and encodes information transmitted over thechannel.
 16. The system of claim 15, wherein designing the coding schemeincludes designing the coding scheme by joint source channel coding. 17.The system of claim 13, wherein determining the representation of thecapacity includes: determining, using stochastic optimal controltechniques, a subset of distributions which include the channel inputdistribution that achieves the capacity.
 18. The system of claim 17,wherein the subset of distributions is a first subset of distributions,and wherein determining the representation of the capacity furtherincludes: determining, based on the first subset of distributions andusing a variational equality of conditional mutual information or mutualinformation, an upper bound to identify a second subset of distributionswhich includes the channel input distribution that achieves thecapacity, wherein the second subset of distributions is smaller than thefirst subset of distributions.
 19. The system of claim 13, wherein therepresentation of the capacity is a first representation of afinite-block length capacity, and wherein the computer-readableinstructions further cause the system to: determine, based on the firstrepresentation of the finite-block length capacity, determining a secondrepresentation of the capacity, wherein the second representation is anupper bound on the first representation and represents the capacity foran infinite number of transmissions over the channel per unit time. 20.The system of claim 13, wherein the channel model is a probabilisticmap.
 21. The system of claim 13, wherein the channel model is a functionof the past and present channel input symbols.